2019
DOI: 10.1155/2019/9858371
|View full text |Cite
|
Sign up to set email alerts
|

Application of Multivariate Adaptive Regression Splines (MARSplines) for Predicting Hansen Solubility Parameters Based on 1D and 2D Molecular Descriptors Computed from SMILES String

Abstract: A new method of Hansen solubility parameters (HSPs) prediction was developed by combining the multivariate adaptive regression splines (MARSplines) methodology with a simple multivariable regression involving 1D and 2D PaDEL molecular descriptors. In order to adopt the MARSplines approach to QSPR/QSAR problems, several optimization procedures were proposed and tested. e effectiveness of the obtained models was checked via standard QSPR/QSAR internal validation procedures provided by the QSARINS software and by… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2019
2019
2025
2025

Publication Types

Select...
9

Relationship

1
8

Authors

Journals

citations
Cited by 15 publications
(10 citation statements)
references
References 112 publications
(172 reference statements)
0
10
0
Order By: Relevance
“…MARS can be focused within the new scientific paradigm [ 115 ] of the “data driven-modeling” [ 100 , 116 , 117 ], one of the foundations of machine learning techniques, being defined on a bi-objective algorithm (elaborated from a “two-stage process”) [ 118 ] in which two different phases are distinguished [ 108 , 109 , 111 , 119 , 120 ]: forward selection and backward deletion. Formally, following Koc and Bozdogan [ 102 ] and Zhang and Goh [ 100 ], its working-schema can be defined from Y , the output or objective-dependent variable response, and , a matrix of j input variables (predictors), assuming that the data are generated under an “unknown and true model”.…”
Section: Methodsmentioning
confidence: 99%
“…MARS can be focused within the new scientific paradigm [ 115 ] of the “data driven-modeling” [ 100 , 116 , 117 ], one of the foundations of machine learning techniques, being defined on a bi-objective algorithm (elaborated from a “two-stage process”) [ 118 ] in which two different phases are distinguished [ 108 , 109 , 111 , 119 , 120 ]: forward selection and backward deletion. Formally, following Koc and Bozdogan [ 102 ] and Zhang and Goh [ 100 ], its working-schema can be defined from Y , the output or objective-dependent variable response, and , a matrix of j input variables (predictors), assuming that the data are generated under an “unknown and true model”.…”
Section: Methodsmentioning
confidence: 99%
“…Features that correlate with predicting the amorphous behavior of pure APIs may also show importance in prediction models of API–carrier ASD systems. In another model solely focused on small molecule solubility properties, Przybyłek et al constructed a model to predict Hansen solubility parameters from a dataset of 130 compounds for which measured solvent solubility parameter data were available [ 144 ]. A large collection of connectivity features, indices, and physicochemical properties were generated directly from SMILES (simplified molecular-input line-entry system) data and used as input features to train multivariate adaptive regression splines for solvent solubility parameter prediction.…”
Section: Machine Learning Approachesmentioning
confidence: 99%
“…The algorithms and benchmarking process have been performed using STATISTICA 12 (StatSoft Inc.) For more details concerning the above-mentioned ML techniques, see the papers (Gorunescu, 2011) for the competitive / collaborative systems; (Bishop, 1995) for neural networks fundamentals; (Breiman, 2001) for random forests ML algorithm; (Chu, 2019) for spline models; (De'ath, 2007) for boosted trees; (Denisko & Hoffman, 2018) for random forests; (Elith, Leathwick & Hastie, 2008) for regression trees; (Grant, Eltoukhy & Asfour, 2014) for neural networks; (Natekin & Knoll, 2013) for gradient boosting machines; (Przybylek, Jelinski & Cysewski, 2019) for committee decision; (Stoean & Stoean, 2014) for SVM; (Jammalamadaka, Qiu & Ning, 2019) for times series approach; (Chinthalapati, Mitra & Serguieva, 2019) for big data and noise.…”
Section: Calculationmentioning
confidence: 99%