We present an extended QSPR modeling of solubilities of about 500 substances in series of up to 69 diverse solvents. The models are obtained with our new software package, CODESSA PRO, which is furnished with an advanced variable selection procedure and a large pool of theoretically derived molecular descriptors. The squared correlation coefficients and squared standard deviations (variances) range from 0.837 and 0.1 for 2-pyrrolidone to 0.998 and 0.02 for dipropyl ether, respectively. The predictive power of the models was verified by using the "leave-one-out" cross-validation procedure. The QSPR models presented are suitable for the rapid evaluation of solvation free energies of organic compounds.
BACKGROUND TO THE PRESENT SERIES OF PAPERSSolubility is of the utmost significance in numerous areas of human endeavor and interest. Solubility in water is fundamental to environmental issues such as pollution, erosion, and mass transfer. Solubility in organic solvents forms much of the basis of the chemical industry. Solubility determines shelf life and cross contamination. It is critically linked to bioavailability and thus to the effectiveness of pharmaceuticals, biodegradation, suitability of gaseous anesthetics, blood substitutes, oxygen carriers, etc. Toxicity is critically dependent on solubility.Very extensive studies have been carried out on the solubilities of various solute-solvent pairs resulting in diverse theories of solute-solvent interactions that form the basis of our knowledge for the understanding of solubility. 1 These theories are based on concepts ranging from quantitative analysis to statistical mechanics and quantum mechanics. Quantitative treatments of solute-solvent interactions in series of compounds have gained wide attraction and have led to various models for explaining solute-solvent behavior. 2 Most of this work has involved studying a series of solutes dissolved in a single solvent. There are some instances in which the solubilities of a solute in a series of solvents have been examined, as reviewed elsewhere. 3,4 Many of the previous studies provide valuable contributions to the understanding of the general phenomena of solute-solvent interactions. In depth comparisons of published data series have revealed that many gaps exist, which render impossible any general comparison of solvent-solute pairs utilizing only experimental data. Therefore we have proposed the combination of quantitatiVe structure-property/actiVity relationship analysis and subsequent principal component analysis for the general treatment of solubility. 5 A common procedure in quantitative structure-property/ activity relationships (QSPR/QSAR) analysis is the application of variable selection methods such as stepwise forward selection, 6,7 genetic algorithms, 8,9 and simulated annealing 10,11 for the reduction of descriptor space in order to keep the only most influential descriptors for the prediction of a property (in the present instance solubility). In this first version of our general treatment of solubility w...