2017
DOI: 10.1021/acs.jcim.6b00625
|View full text |Cite
|
Sign up to set email alerts
|

In Silico Prediction of Physicochemical Properties of Environmental Chemicals Using Molecular Fingerprints and Machine Learning

Abstract: There are little available toxicity data on the vast majority of chemicals in commerce. High-throughput screening (HTS) studies, such as being carried out by the U.S. Environmental Protection Agency (EPA) ToxCast program in partnership with the federal Tox21 research program, can generate biological data to inform models for predicting potential toxicity. However, physicochemical properties are also needed to model environmental fate and transport, as well as exposure potential. The purpose of the present stud… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
105
0
1

Year Published

2018
2018
2024
2024

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 142 publications
(108 citation statements)
references
References 58 publications
2
105
0
1
Order By: Relevance
“…Machine learning approaches are typically used to map the structure of compounds to their properties, a method called quantitative structure-activity relationship (QSAR). Common algorithms include multiple linear regression, random forest or support vector machine in combination with circular fingerprints or molecular properties to describe the molecules [3][4][5][6][7]. Water solubility and melting point are two endpoints for which a lot of previous modeling was published [3][4][5][8][9][10][11][12][13].…”
Section: Introductionmentioning
confidence: 99%
“…Machine learning approaches are typically used to map the structure of compounds to their properties, a method called quantitative structure-activity relationship (QSAR). Common algorithms include multiple linear regression, random forest or support vector machine in combination with circular fingerprints or molecular properties to describe the molecules [3][4][5][6][7]. Water solubility and melting point are two endpoints for which a lot of previous modeling was published [3][4][5][8][9][10][11][12][13].…”
Section: Introductionmentioning
confidence: 99%
“…This level of error is consistent with other current state-of-the-art log P prediction methods, which generally come from Quantitative Structure Activity Relationships (QSAR). 61,62 On the left hand side of Figure 1, the initialization process is expounded, which provides initial training data for the optimizer upon which to initialize its Gaussian process. In this phase, force field parameters are selected at random from a sampling grid constructed by the optimizer based upon user input.…”
Section: Literature Datamentioning
confidence: 99%
“…Their predictive accuracy varies with the chemistry involved, so the best way to assess their usefulness for a particular area of chemistry is to compare their predictions to the observed values for compounds one has recently encountered whose properties have been measured. Applicability domain information alone may not be adequate for herbicides, which are not well represented in many ADMET data sets . Moreover, training sets are likely to be biased towards either very well‐behaved commercial compounds or notorious ones like methyl viologen and dioxins, for example, instead of the kind of lead compounds likely to be encountered in agrochemical discovery and development.…”
Section: Caveatsmentioning
confidence: 99%
“…Applicability domain information alone may not be adequate for herbicides, which are not well represented in many ADMET data sets. 49 Moreover, training sets are likely to be biased towards either very well-behaved commercial compounds or notorious ones like methyl viologen and dioxins, for example, instead of the kind of lead compounds likely to be encountered in agrochemical discovery and development.…”
Section: Caveatsmentioning
confidence: 99%