Is experimental data quality the limiting factor in predicting the aqueous solubility of druglike molecules?

David S. Palmer, John B. O. Mitchell

Research output: Contribution to journalArticlepeer-review

58 Citations (Scopus)
97 Downloads (Pure)


We report the results of testing Quantitative Structure-Property Relationships (QSPR) that were trained upon the same druglike molecules but two different sets of solubility data: (i) data ex- tracted from several different sources from the published literature, for which the experimental uncertainty is estimated to be 0.6-0.7 log S units (referred to mol/l); (ii) data measured by a sin- gle accurate experimental method (CheqSol), for which experimental uncertainty is typically < 0.05 log S units. Contrary to what might be expected, the models derived from the CheqSol experimental data are not more accurate than those derived from the “noisy” literature data. The results suggest that, at the present time, it is the deficiency of QSPR methods (algorithms and/or descriptor sets), and not, as is commonly quoted, the uncertainty in the experimen- tal measurements, which is the limiting factor in accurately predicting aqueous solubility for pharmaceutical molecules.
Original languageEnglish
Pages (from-to)2962–2972
Number of pages11
JournalMolecular Pharmaceutics
Issue number8
Early online date9 Jul 2014
Publication statusPublished - 4 Aug 2014


  • pharmaceutical
  • rule-of-five
  • solubility
  • bioavailability
  • QSPR
  • QSAR
  • druglike
  • ADME
  • Random Forest
  • dissolution
  • experimental error
  • CheqSol
  • Noyes−Whitney
  • Henderson−Hasselbalch
  • polymorph
  • crystal
  • machine learning
  • general solubility equation

Cite this