Crystallisation thermodynamics and random forest classification for the prediction of crystallisation outcomes

  • Siya Nakapraves

Student thesis: Doctoral Thesis


Crystallisation is one of the key unit operations in the pharmaceutical industry. A wide range of crystal attributes affects the bulk particle properties of a crystalline material as well as its downstream manufacturability. Therefore, understanding and controlling the crystallisation process to achieve the desired quality attributes are ofsignificant interest. This thesis investigated the potential of machine learning techniques in terms of the prediction of crystallisation outcomes, focusing on the shapes of mefenamic acid (MFA) crystals from various organic solvents, and solvated structures of small organic molecules considered by Powder X-ray Diffraction (PXRD)patterns. The solubility and nucleation of MFA were also explored in this thesis in an attempt to understand the thermodynamic and kinetic interactions during the crystallisation process of MFA. It was observed that the nucleation of MFA in methanol, ethanol, 2-propanol, 2-butanol, acetone, and tetrahydrofuran (THF) follows a two-step mechanism, in which the crystals nucleate within the metastable clusters. The comparison between surface free energy determined from nucleation rates and that calculated by Turnbull’s rule also proposes that the crystals nucleated faster via two-step nucleation compared to classical nucleation theory (CNT), due to the smaller nucleation barrier. For the machine learning application for predicting the crystallisation outcomes, the result showed that random forest classification models using solvent physical property descriptors can reliably predict crystal morphologies for MFA crystals grown in 20 out of the 28 solvents included in this work. Further characterization of the crystals grown in the remaining 8 solvents with poor model performance also resulted in the discovery of a new THF solvated form of MFA crystals. The ability of machine learning was also investigated to predict the solvated form of small organic molecules from the PXRD patterns derived from Cambridge Structural Database (CSD). The best model in this study showed 68.74% of prediction accuracy. These findings demonstrate the potential role of machine learning and data mining to assist the decision-making in crystallisation while reducing the uses of materials and time spent during the process development.
Date of Award23 Feb 2023
Original languageEnglish
Awarding Institution
  • University Of Strathclyde
SupervisorAlastair Florence (Supervisor) & John Robertson (Supervisor)

Cite this