2019
DOI: 10.1038/s41597-019-0151-1
|View full text |Cite
|
Sign up to set email alerts
|

AqSolDB, a curated reference set of aqueous solubility and 2D descriptors for a diverse set of compounds

Abstract: Water is a ubiquitous solvent in chemistry and life. It is therefore no surprise that the aqueous solubility of compounds has a key role in various domains, including but not limited to drug discovery, paint, coating, and battery materials design. Measurement and prediction of aqueous solubility is a complex and prevailing challenge in chemistry. For the latter, different data-driven prediction models have recently been developed to augment the physics-based modeling approaches. To construct accurate data-driv… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
153
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
1
1

Relationship

3
5

Authors

Journals

citations
Cited by 143 publications
(154 citation statements)
references
References 21 publications
1
153
0
Order By: Relevance
“…Data points for MP, BP, log S, and PP were initially collected from CRC handbook of chemistry and physics 97 th edition (CRC handbook) in which inorganic compounds and organometallics used in laboratory and industry were selected. 27 Additional data points for MP and PP were obtained from the work of Igor V. Tetko et al, 28 and further log S data points from AqSolDB, 29 which were ltered into three groups to select only inorganics and organometallics: (1) compounds without carbon, (2) compounds without hydrogen, and (3) compounds with a metal atom (Fig. S3 †).…”
Section: Data Collectionmentioning
confidence: 99%
“…Data points for MP, BP, log S, and PP were initially collected from CRC handbook of chemistry and physics 97 th edition (CRC handbook) in which inorganic compounds and organometallics used in laboratory and industry were selected. 27 Additional data points for MP and PP were obtained from the work of Igor V. Tetko et al, 28 and further log S data points from AqSolDB, 29 which were ltered into three groups to select only inorganics and organometallics: (1) compounds without carbon, (2) compounds without hydrogen, and (3) compounds with a metal atom (Fig. S3 †).…”
Section: Data Collectionmentioning
confidence: 99%
“…By having unique structural properties, every subfield yet requires tailored procedures for virtual screening. Moreover, in order to apply AI methods in disparate material sub-fields, firstly, it is necessary to produce a sufficient amount of high fidelity and quality data from experimental or computational studies 17 .…”
Section: Introductionmentioning
confidence: 99%
“…In the current work, to develop an accurate solubility prediction model, we focus on the effects of data size and data quality on the prediction performance of ML models. Starting with the design of a quality-oriented data selection method that extracts the most accurate part of the data, and applying it on Aq-SolDB [12] -the largest publicly available solubility dataset that has been curated by using multiple data sources -the Aqueous Fig.1 The categorization of the affecting factors for solubility predictions and their relationship with the actual and observed performances. a A three-layered structure showing the categorization of the affecting factors on the accuracy of solubility prediction ML models.…”
Section: • Training and Testing The Modelmentioning
confidence: 99%
“…Since there had been very few solubility data publicly available, studies on solubility prediction have been limited with a few thousands of compounds for training and a few hundreds of compounds for testing [8,17]. With an increase in public data resources, such as AqSolDB [12] consisting of a diverse set of ∼10 4 compounds, it is becoming more feasible to conduct reliable testing studies to improve the accuracies of the data-driven models.…”
Section: The Size Of Datamentioning
confidence: 99%
See 1 more Smart Citation