Why are some properties more difficult to predict than others? A study of QSPR models of solubility, melting point, and Log P

L. D. Hughes, D. S. Palmer, F. Nigsch, J. B. Mitchell

Research output: Contribution to journalArticle

105 Citations (Scopus)

Abstract

This paper attempts to elucidate differences in QSPR models of aqueous solubility (Log S), melting point (Tm), and octanol-water partition coefficient (Log P), three properties of pharmaceutical interest. For all three properties, Support Vector Machine models using 2D and 3D descriptors calculated in the Molecular Operating Environment were the best models. Octanol-water partition coefficient was the easiest property to predict, as indicated by the RMSE of the external test set and the coefficient of determination (RMSE = 0.73, r2 = 0.87). Melting point prediction, on the other hand, was the most difficult (RMSE = 52.8 degrees C, r2 = 0.46), and Log S statistics were intermediate between melting point and Log P prediction (RMSE = 0.900, r2 = 0.79). The data imply that for all three properties the lack of measured values at the extremes is a significant source of error. This source, however, does not entirely explain the poor melting point prediction, and we suggest that deficiencies in descriptors used in melting point prediction contribute significantly to the prediction errors.
LanguageUndefined/Unknown
Pages220-232
Number of pages13
JournalJournal of Chemical Information and Modeling
Volume48
Issue number1
DOIs
Publication statusPublished - 1 Jan 2008

Keywords

  • QSPR models
  • aqueous solubility
  • solubility

Cite this

@article{c4bdc76f604e46b2a73a193fa0167d16,
title = "Why are some properties more difficult to predict than others? A study of QSPR models of solubility, melting point, and Log P",
abstract = "This paper attempts to elucidate differences in QSPR models of aqueous solubility (Log S), melting point (Tm), and octanol-water partition coefficient (Log P), three properties of pharmaceutical interest. For all three properties, Support Vector Machine models using 2D and 3D descriptors calculated in the Molecular Operating Environment were the best models. Octanol-water partition coefficient was the easiest property to predict, as indicated by the RMSE of the external test set and the coefficient of determination (RMSE = 0.73, r2 = 0.87). Melting point prediction, on the other hand, was the most difficult (RMSE = 52.8 degrees C, r2 = 0.46), and Log S statistics were intermediate between melting point and Log P prediction (RMSE = 0.900, r2 = 0.79). The data imply that for all three properties the lack of measured values at the extremes is a significant source of error. This source, however, does not entirely explain the poor melting point prediction, and we suggest that deficiencies in descriptors used in melting point prediction contribute significantly to the prediction errors.",
keywords = "QSPR models , aqueous solubility , solubility",
author = "Hughes, {L. D.} and Palmer, {D. S.} and F. Nigsch and Mitchell, {J. B.}",
year = "2008",
month = "1",
day = "1",
doi = "10.1021/ci700307p",
language = "Undefined/Unknown",
volume = "48",
pages = "220--232",
journal = "Journal of Chemical Information and Modeling",
issn = "1549-9596",
publisher = "American Chemical Society",
number = "1",

}

Why are some properties more difficult to predict than others? A study of QSPR models of solubility, melting point, and Log P. / Hughes, L. D.; Palmer, D. S.; Nigsch, F.; Mitchell, J. B.

In: Journal of Chemical Information and Modeling , Vol. 48, No. 1, 01.01.2008, p. 220-232.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Why are some properties more difficult to predict than others? A study of QSPR models of solubility, melting point, and Log P

AU - Hughes, L. D.

AU - Palmer, D. S.

AU - Nigsch, F.

AU - Mitchell, J. B.

PY - 2008/1/1

Y1 - 2008/1/1

N2 - This paper attempts to elucidate differences in QSPR models of aqueous solubility (Log S), melting point (Tm), and octanol-water partition coefficient (Log P), three properties of pharmaceutical interest. For all three properties, Support Vector Machine models using 2D and 3D descriptors calculated in the Molecular Operating Environment were the best models. Octanol-water partition coefficient was the easiest property to predict, as indicated by the RMSE of the external test set and the coefficient of determination (RMSE = 0.73, r2 = 0.87). Melting point prediction, on the other hand, was the most difficult (RMSE = 52.8 degrees C, r2 = 0.46), and Log S statistics were intermediate between melting point and Log P prediction (RMSE = 0.900, r2 = 0.79). The data imply that for all three properties the lack of measured values at the extremes is a significant source of error. This source, however, does not entirely explain the poor melting point prediction, and we suggest that deficiencies in descriptors used in melting point prediction contribute significantly to the prediction errors.

AB - This paper attempts to elucidate differences in QSPR models of aqueous solubility (Log S), melting point (Tm), and octanol-water partition coefficient (Log P), three properties of pharmaceutical interest. For all three properties, Support Vector Machine models using 2D and 3D descriptors calculated in the Molecular Operating Environment were the best models. Octanol-water partition coefficient was the easiest property to predict, as indicated by the RMSE of the external test set and the coefficient of determination (RMSE = 0.73, r2 = 0.87). Melting point prediction, on the other hand, was the most difficult (RMSE = 52.8 degrees C, r2 = 0.46), and Log S statistics were intermediate between melting point and Log P prediction (RMSE = 0.900, r2 = 0.79). The data imply that for all three properties the lack of measured values at the extremes is a significant source of error. This source, however, does not entirely explain the poor melting point prediction, and we suggest that deficiencies in descriptors used in melting point prediction contribute significantly to the prediction errors.

KW - QSPR models

KW - aqueous solubility

KW - solubility

U2 - 10.1021/ci700307p

DO - 10.1021/ci700307p

M3 - Article

VL - 48

SP - 220

EP - 232

JO - Journal of Chemical Information and Modeling

T2 - Journal of Chemical Information and Modeling

JF - Journal of Chemical Information and Modeling

SN - 1549-9596

IS - 1

ER -