2021
DOI: 10.1186/s13040-021-00240-3
|View full text |Cite
|
Sign up to set email alerts
|

Feature selection using distributions of orthogonal PLS regression vectors in spectral data

Abstract: Feature selection, which is important for successful analysis of chemometric data, aims to produce parsimonious and predictive models. Partial least squares (PLS) regression is one of the main methods in chemometrics for analyzing multivariate data with input X and response Y by modeling the covariance structure in the X and Y spaces. Recently, orthogonal projections to latent structures (OPLS) has been widely used in processing multivariate data because OPLS improves the interpretability of PLS models by remo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 16 publications
(9 citation statements)
references
References 32 publications
0
9
0
Order By: Relevance
“…The features were checked for correlations as highly correlated features would not only be redundant and unnecessary, supplying no new information, but could also degrade the performance, reduce the interpretability, and lead to overfitting of the ML model. , Figure a shows a correlation matrix where the Pearson correlation coefficient (eq ) between any two features is indicated. Here, we can see that there is a strong correlation between ρ, D 0, η, and all RDF values, as well as a slightly weaker but significant correlation of T with the aforementioned features.…”
Section: Resultsmentioning
confidence: 99%
“…The features were checked for correlations as highly correlated features would not only be redundant and unnecessary, supplying no new information, but could also degrade the performance, reduce the interpretability, and lead to overfitting of the ML model. , Figure a shows a correlation matrix where the Pearson correlation coefficient (eq ) between any two features is indicated. Here, we can see that there is a strong correlation between ρ, D 0, η, and all RDF values, as well as a slightly weaker but significant correlation of T with the aforementioned features.…”
Section: Resultsmentioning
confidence: 99%
“…Permutation test, a computer-based resampling method for remodeling and predicting, was widely used in the computation of variable importance and confidence intervals [43,44]. It could be considered that the model had not been overfitted when the Y-axis intercept of R 2 and Q 2 for the established OPLS-DA models was less than 0.3 and 0.05, respectively [45][46][47].…”
Section: Pca and Opls-damentioning
confidence: 99%
“…Uncovering hidden patterns associated with the target variables is not trivial [10]. One of the major issues in addressing this problem is how to deal with the existence of a high correlation between supplementary and target data, which can cause a problem in overfitting.…”
Section: Predictive Error Compensating Neural Networkmentioning
confidence: 99%
“…The highest correlating data to the residual error is selected as supplementary data in the second network. Generally, one or two additional feature components are sufficient [10], however, the proposed PEC-WNN model is able to select as many features as supplementary data correlates to the residual error.…”
Section: Introductionmentioning
confidence: 99%