Physics-based and deep learning approaches for the determination of solution thermodynamic parameters

Student thesis: Doctoral Thesis

Abstract

Thermodynamic parameters associated with dissolution are frequently obtained via cheminformatics models, however, such an approach requires a significant quantity of training data, and does not guarantee that the model will perform competently for molecules beyond a threshold of dissimilarity to those used during training. Furthermore, physics-based approaches often include a range of approximations that limit their accuracy. There is thus much scope to develop new models which avoid these pitfalls. In Chapter 5, a physics-based approach for the prediction of intrinsic aqueous solubility is proposed. This proof-of-concept was developed for use with the sublimation thermodynamic cycle, and expands upon previous work by replacing several thermodynamic approximations with theoretically rigorous quantum mechanical calculations of the crystalline phase. Combining these with hydration free energies obtained from MD/FEP simulations or density functional theory leads to calculated solubilites that are comparable to both experiment and cheminformatics-based machine learning predictions. This approach also highlights how methods must be adapted to model different conformations in different phases, and the influence those conformations can have on any final solubility prediction. In Chapter 6, the accurate prediction of solvation free energy using 1D-RISM, in a method referred to as pyRISM-CNN, is reported. With this approach, a 1D CNN trained on RISM correlation functions is combined with the 1D-RISM solver, pyRISM, to predict the solvation free energy of small organic and drug-like molecules across several organic or aqueous solvent systems at temperatures beyond 298K. The pyRISMCNN functional reduces the predictive error by up to 40-fold as compared to the standard 1D-RISM theory, with errors below 1 kcal/mol obtained for solutes in organic solvents at 298K and water solvent systems at 273-373K. In Chapter 7, an extended version of the pyRISM-CNN methodology has also been developed to allow for the prediction of additional thermodynamic parameters in a wider range of solvents and environmental conditions. Firstly, the number of solvents in the training data has been expanded from carbon tetrachloride, water and chloroform to now also include methanol. Secondly, solvation free energies have been introduced for organic molecular ions in methanol and water solvent systems at 298K. For neutral solutes, prediction errors nearing or below 1 kcal/mol are obtained for each organic solvent system at 298K and water solvent systems at 273-373K. Errors below 4 kcal/mol are obtained for organic molecular ions without the need for corrections or additional descriptors. Lastly, pyRISM-CNN was successfully applied to the simultaneous prediction of solvation enthalpy, entropy and free energy through a multi-task learning approach, with errors of 1.04, 0.98 and 0.47 kcal/mol, respectively, for water solvent systems at 298K. There has been limited development of organic solvent models for use with RISM, with any development typically done on a solvent-by-solvent basis without a standardised procedure. The challenges faced when building organic solvent models within RISM are often the result of common convergence problems observed during model development, or with the choice of Lennard-Jones parameters used to represent intermolecular interactions. In Chapter 8, a new method of parameterising coarse-grained organic solvent models for the accurate prediction of solvation free energy is proposed. This method models solvent molecules as LJ spheres, with representative solvent parameters determined through an extensive grid search of the most favourable energy surfaces calculated from MP2 based QM calculations. This approach has been tested with the standard 3D-RISM theory and pyRISM-CNN, from which promising results were obtained for a chloroform solvent model that can rival the accuracy of current atomistic solvent models.
Date of Award3 Aug 2023
Original languageEnglish
Awarding Institution
  • University Of Strathclyde
SponsorsUniversity of Strathclyde
SupervisorDavid Palmer (Supervisor) & Craig Jamieson (Supervisor)

Cite this

'