2010
DOI: 10.1002/cem.1360
|View full text |Cite
|
Sign up to set email alerts
|

Variable selection in regression—a tutorial

Abstract: This paper provides a practical guide to variable selection in chemometrics with a focus on regression-based calibration models. Several approaches, such as genetic algorithms (GAs), jack-knifing, forward selection, etc., are explained; it is also explained how to choose between different kinds of variable selection methods. The emphasis in this paper is on how to use variable selection in practice and avoid the most common pitfalls.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
358
0
11

Year Published

2012
2012
2018
2018

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 614 publications
(371 citation statements)
references
References 28 publications
(34 reference statements)
2
358
0
11
Order By: Relevance
“…It is presented on Figure S1 in the supplementary material. Thus, it confirmed the earlier statement of Andersen and Bro [25], that the genetic algorithm for individual wavelengths cannot produce appropriate results.…”
Section: The Results Of Gasupporting
confidence: 88%
See 2 more Smart Citations
“…It is presented on Figure S1 in the supplementary material. Thus, it confirmed the earlier statement of Andersen and Bro [25], that the genetic algorithm for individual wavelengths cannot produce appropriate results.…”
Section: The Results Of Gasupporting
confidence: 88%
“…The iPLS method is a very common choice for variable selection especially in the case of near-infrared spectra and NMR spectra because spectral data are highly correlated and the usage of variable windows is a better option than examination of each variable individually [25][26][27][28]. This technique is very similar to the original PLS method, but here the spectra are divided into a number of intervals (equal length or manually made intervals).…”
Section: Interval Pls Methods (Ipls)mentioning
confidence: 99%
See 1 more Smart Citation
“…In case of HPLC and CE data processing variable selection [35] was performed on the basis of regression coefficients of the variables in the models. Root mean squared error of cross-validation (RMSECV) and root mean squared error of prediction (RMSEP) were employed as metrics for predictive performance of the models.…”
Section: Data Processingmentioning
confidence: 99%
“…Instead we employed the whole chromatographic profile ("unresolved" raw data) as input for PLS modeling. In order to improve the quality of the regression models, variable selection was performed on the basis of the values of regression coefficients of corresponding variables [35] using RMSECV as a performance metric. During this procedure irrelevant and noisy variables were removed from the processing.…”
Section: Hplc Analysismentioning
confidence: 99%