2013
DOI: 10.5936/csbj.201302007
|View full text |Cite
|
Sign up to set email alerts
|

Multivariate Linear QSPR/Qsar Models: Rigorous Evaluation of Variable Selection for PLS

Abstract: Basic chemometric methods for making empirical regression models for QSPR/QSAR are briefly described from a user's point of view. Emphasis is given to PLS regression, simple variable selection and a careful and cautious evaluation of the performance of PLS models by repeated double cross validation (rdCV). A demonstration example is worked out for QSPR models that predict gas chromatographic retention indices (values between 197 and 504 units) of 209 polycyclic aromatic compounds (PAC) from molecular descripto… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
25
0

Year Published

2015
2015
2024
2024

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 33 publications
(25 citation statements)
references
References 24 publications
0
25
0
Order By: Relevance
“…SpMax3_Bh(m) has been used in predicting depuration rate constants for environmental pollutants of the polychlorinated biphenyls group (87), and the less relevant (in our case) SpMax6_Bh(m) has been used to predict chronic toxicity of substances to Pseudokirchneriella subcapitata (88). The second most important descriptor for our data set was DECC (eccentric topologic index) has been previously reported to be important in the prediction of MAO-A activity (89,90) , placental barrier permeability (91), and gas chromatographic retention times (92). F06 [C-N] was used in a model to describe the anti-proliferative effect of phenyl 4-(2-oxoimidazolidin-1yl)-benzenesulfonates (local QSAR model) (93), anti-malaric effect (94), or skin permeability of substances (95).…”
Section: Discussionmentioning
confidence: 98%
“…SpMax3_Bh(m) has been used in predicting depuration rate constants for environmental pollutants of the polychlorinated biphenyls group (87), and the less relevant (in our case) SpMax6_Bh(m) has been used to predict chronic toxicity of substances to Pseudokirchneriella subcapitata (88). The second most important descriptor for our data set was DECC (eccentric topologic index) has been previously reported to be important in the prediction of MAO-A activity (89,90) , placental barrier permeability (91), and gas chromatographic retention times (92). F06 [C-N] was used in a model to describe the anti-proliferative effect of phenyl 4-(2-oxoimidazolidin-1yl)-benzenesulfonates (local QSAR model) (93), anti-malaric effect (94), or skin permeability of substances (95).…”
Section: Discussionmentioning
confidence: 98%
“…The QSAR models are built using multiple algorithms. Independent input variables are sets of quantitative characteristics of molecules included in the studies and called descriptors. They represent topological properties of molecular graphs, describing three‐dimensional molecular fields and interactions with target proteins, just physicochemical properties, etc.…”
Section: Introductionmentioning
confidence: 99%
“…Dependent input variables are biological activities, toxicities and other properties which the models should be able to predict with sufficiently high accuracy. There are other steps of preprocessing and curation of descriptors not discussed here in detail . The models are built using different QSAR methodologies (for example, see the publication by K. Varmuza et al …”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…PLS algorithm was chosen since it is largely used in medicinal chemistry [17]. Machine learning tools on the other hand are largely used by biologists and have also been shown to have potential utility in the modelling of pharmaceutical problems [18].…”
Section: Introductionmentioning
confidence: 99%