Projects per year
Abstract
Purpose: This study describes the development and application of machine-learning models to the prediction of the crystal shape of mefenamic acid recrystallized from organic solvents.
Method: Mefenamic acid crystals were grown in 30 different solvents and categorized according to crystal shape as either polyhedral or needle. A total of 87 random forest classification models were trained on this data. Initially, 3 models were built to assess the efficacy of this method. These models were trained on datasets containing Molecular Operating Environment (MOE) descriptors for the solvents and crystal shapes labels obtained by visual inspection of microscope images. The subsequent 84 models tested prediction accuracy for individual solvents that were sequentially excluded from the model training sets. In total, three different sets of MOE descriptors (one set that contained all available 2D descriptors, a second set that focused on molecular structure and a third set that focused on physical properties) were investigated to determine which of these three sets of descriptors resulted in the highest overall prediction accuracy across the different solvents.
Results: For the initial three models, the highest prediction accuracy of crystal shape observed was 93.5% as assessed by 4-fold cross-validation. When solvents were sequentially excluded from training data, 32 out of 84 models predicted the shape of mefenamic acid crystals for the excluded solvent with 100% accuracy and a further 21 models had prediction accuracies from 50-100%. Reducing the feature set to only solvent physical property descriptors and supersaturations resulted in higher overall prediction accuracies than the models using atom count, bond count, and pharmacophore descriptors and the models using all solvent molecular descriptors. For the 8 solvents on which the models performed poorly (<50% accuracy), further characterisation of crystals grown in these solvents resulted in the discovery of a new mefenamic acid solvate. However, all other crystals were the previously known form I.
Conclusion: Random forest classification models using solvent physical property descriptors can reliably predict crystal morphologies for mefenamic acid crystals grown in 20 out of the 28 solvents included in this work. Poor prediction accuracies for the remaining 7 solvents may be an indication that the factors not adequately covered by the training data result in these solvents being outliers.
Method: Mefenamic acid crystals were grown in 30 different solvents and categorized according to crystal shape as either polyhedral or needle. A total of 87 random forest classification models were trained on this data. Initially, 3 models were built to assess the efficacy of this method. These models were trained on datasets containing Molecular Operating Environment (MOE) descriptors for the solvents and crystal shapes labels obtained by visual inspection of microscope images. The subsequent 84 models tested prediction accuracy for individual solvents that were sequentially excluded from the model training sets. In total, three different sets of MOE descriptors (one set that contained all available 2D descriptors, a second set that focused on molecular structure and a third set that focused on physical properties) were investigated to determine which of these three sets of descriptors resulted in the highest overall prediction accuracy across the different solvents.
Results: For the initial three models, the highest prediction accuracy of crystal shape observed was 93.5% as assessed by 4-fold cross-validation. When solvents were sequentially excluded from training data, 32 out of 84 models predicted the shape of mefenamic acid crystals for the excluded solvent with 100% accuracy and a further 21 models had prediction accuracies from 50-100%. Reducing the feature set to only solvent physical property descriptors and supersaturations resulted in higher overall prediction accuracies than the models using atom count, bond count, and pharmacophore descriptors and the models using all solvent molecular descriptors. For the 8 solvents on which the models performed poorly (<50% accuracy), further characterisation of crystals grown in these solvents resulted in the discovery of a new mefenamic acid solvate. However, all other crystals were the previously known form I.
Conclusion: Random forest classification models using solvent physical property descriptors can reliably predict crystal morphologies for mefenamic acid crystals grown in 20 out of the 28 solvents included in this work. Poor prediction accuracies for the remaining 7 solvents may be an indication that the factors not adequately covered by the training data result in these solvents being outliers.
Original language | English |
---|---|
Pages (from-to) | 1-13 |
Number of pages | 13 |
Journal | Pharmaceutical Research |
Early online date | 19 Dec 2022 |
DOIs | |
Publication status | E-pub ahead of print - 19 Dec 2022 |
Keywords
- mefenamic acid
- crystallisation
- random forest classification
- crystal shape prediction
Fingerprint
Dive into the research topics of 'Prediction of mefenamic acid crystal shape by random forest classification'. Together they form a unique fingerprint.Projects
- 1 Active