We report the results of testing Quantitative Structure-Property Relationships (QSPR) that were trained upon the same druglike molecules but two different sets of solubility data: (i) data ex- tracted from several different sources from the published literature, for which the experimental uncertainty is estimated to be 0.6-0.7 log S units (referred to mol/l); (ii) data measured by a sin- gle accurate experimental method (CheqSol), for which experimental uncertainty is typically < 0.05 log S units. Contrary to what might be expected, the models derived from the CheqSol experimental data are not more accurate than those derived from the “noisy” literature data. The results suggest that, at the present time, it is the deficiency of QSPR methods (algorithms and/or descriptor sets), and not, as is commonly quoted, the uncertainty in the experimen- tal measurements, which is the limiting factor in accurately predicting aqueous solubility for pharmaceutical molecules.
- Random Forest
- experimental error
- machine learning
- general solubility equation
Palmer, D. S., & Mitchell, J. B. O. (2014). Is experimental data quality the limiting factor in predicting the aqueous solubility of druglike molecules? Molecular Pharmaceutics, 11(8), 2962–2972. https://doi.org/10.1021/mp500103r