2010
DOI: 10.1002/cem.1349
|View full text |Cite
|
Sign up to set email alerts
|

Preventing over‐fitting in PLS calibration models of near‐infrared (NIR) spectroscopy data using regression coefficients

Abstract: a Selection of the number of latent variables (LVs) to include in a partial least squares (PLS) model is an important step in the data analysis. Inclusion of too few or too many LVs may lead to, respectively, under or over-fitting of the data and subsequently result in poor future model performance. One well-known sign of over-fitting is the appearance of noise in regression coefficients; this often takes the form of a reduction in apparent structure and the presence of sharp peaks with a high degree of direct… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
122
0

Year Published

2012
2012
2019
2019

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 169 publications
(122 citation statements)
references
References 13 publications
0
122
0
Order By: Relevance
“…Another measure recently studied to characterize model complexity is the jaggedness of the model vector [28] defined by…”
Section: Plsmentioning
confidence: 99%
See 3 more Smart Citations
“…Another measure recently studied to characterize model complexity is the jaggedness of the model vector [28] defined by…”
Section: Plsmentioning
confidence: 99%
“…As noted in section 1, approaches have been developed to remove the potential ambiguity in determining the corner region of an L-curve by forming U-curves with the best tuning parameter value at the minimum allowing automatic tuning parameter selection [20,23,28]. Two specific merits to be evaluated with SRD in this study are generalized CV (GCV) [46], AIC [47], BIC [48], trace (X T X) + [21], and others [12,18,19].…”
Section: Rrmentioning
confidence: 99%
See 2 more Smart Citations
“…Evaluation metrics are calculated on samples which have not been involved in the model-building process (Esbensen and Geladi, 2010). Examples of metrics include the minimum root-mean-square error of cross validation (RMSECV) (one of the most widely used metrics; Gowen et al, 2011), one standard deviation above RMSECV (Hastie et al, 2009), Wold's R criterion (Wold, 1978), coefficient of determination -value (van der Voet, 1994;Wiklund et al, 2007), among others. A suite of these metrics can also be considered simultaneously (Zhao et al, 2015).…”
mentioning
confidence: 99%