2021
DOI: 10.1016/j.forsciint.2021.110998
|View full text |Cite
|
Sign up to set email alerts
|

Combining wavelength importance ranking to the random forest classifier to analyze multiclass spectral data

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
16
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 11 publications
(17 citation statements)
references
References 38 publications
0
16
0
Order By: Relevance
“…Furthermore, it was interesting that for our dataset Random Forest performed slightly better than Multilayer Perceptron algorithms when using the Partial feature set (accuracies for best candidates were 88% and 87%, respectively). Two possible factors could be involved: (1) the nature of Random Forest, which is based on conditional decisions, allows better handling of a mixture of categorical and numerical features 42 ; and (2) the relatively small dataset size of our study (total dataset: 385, and training dataset: 308) may be a more pronounced constraint for Multilayer Perceptron, which typically requires large datasets. As an example, using the Partial feature set (16 features), the number of parameters for a relatively simple two-layer MLP (MLP-Partial-10-16) will be 373 (for weights: 16*10 + 10*16 + 16*1 = 336, and for biases: 10 + 16 + 1 = 27), which is greater than the size of our training instances and potentially results in underfitting of the predictive models.…”
Section: Discussionmentioning
confidence: 99%
“…Furthermore, it was interesting that for our dataset Random Forest performed slightly better than Multilayer Perceptron algorithms when using the Partial feature set (accuracies for best candidates were 88% and 87%, respectively). Two possible factors could be involved: (1) the nature of Random Forest, which is based on conditional decisions, allows better handling of a mixture of categorical and numerical features 42 ; and (2) the relatively small dataset size of our study (total dataset: 385, and training dataset: 308) may be a more pronounced constraint for Multilayer Perceptron, which typically requires large datasets. As an example, using the Partial feature set (16 features), the number of parameters for a relatively simple two-layer MLP (MLP-Partial-10-16) will be 373 (for weights: 16*10 + 10*16 + 16*1 = 336, and for biases: 10 + 16 + 1 = 27), which is greater than the size of our training instances and potentially results in underfitting of the predictive models.…”
Section: Discussionmentioning
confidence: 99%
“…The third approach integrates RF and GI under the SFFS algorithm (identified as Phase 2 with RF-GI in Table 5), which is equivalent to traditional wrapperbased methods widely described in the literature. 19 The best result for each dataset is highlighted in bold.…”
Section: The Importance Of the Two-phase Approach For Wavelength Sele...mentioning
confidence: 99%
“…Numerous techniques have relied on importance indices for guiding the removal of less informative features. 8,10,[18][19][20][21][22] Another effective yet less employed strategy for selecting the most informative wavelengths in complex datasets is wavelength clustering. In this concept, wavelengths relying on similar information are inserted into clusters, and a supervised algorithm is applied to each cluster separately.…”
mentioning
confidence: 99%
“…For this model, the χ2 is the variable selection method, and RF is the classification technique. 19 As samples become more complex, it can take multiple analysis methods to completely characterize a sample. Iakab et al proposed a multimodal approach for analyzing biological samples that combines the results of mass spectrometry and vibrational spectroscopy for imaging to get spatial resolution and molecular information.…”
Section: ■ Introductionmentioning
confidence: 99%
“…To classify data like illicit substances, they utilized RF to select areas of interest and combine the wavelengths of importance into a model named χ2–RF. For this model, the χ2 is the variable selection method, and RF is the classification technique …”
Section: Introductionmentioning
confidence: 99%