2018
DOI: 10.3390/info9120317
|View full text |Cite
|
Sign up to set email alerts
|

LICIC: Less Important Components for Imbalanced Multiclass Classification

Abstract: Multiclass classification in cancer diagnostics, using DNA or Gene Expression Signatures, but also classification of bacteria species fingerprints in MALDI-TOF mass spectrometry data, is challenging because of imbalanced data and the high number of dimensions with respect to the number of instances. In this study, a new oversampling technique called LICIC will be presented as a valuable instrument in countering both class imbalance, and the famous “curse of dimensionality” problem. The method enables preservat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
6
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
2

Relationship

2
5

Authors

Journals

citations
Cited by 11 publications
(6 citation statements)
references
References 18 publications
(29 reference statements)
0
6
0
Order By: Relevance
“…The classification is performed with 10-Fold cross validation with 70-30 ratio maintaining the inter-patient separation scheme: 28 subjects in training and 12 in test set randomly kept 10 different times. The produced train-test sets within 10-Fold cross validation may be imbalanced [54]. For this reason and only when there was relevant imbalance between healthy and sick, it was decided to perform, before feature selection, a novel oversampling technique called LICIC [54].…”
Section: Classifiers and Experimental Setupmentioning
confidence: 99%
See 1 more Smart Citation
“…The classification is performed with 10-Fold cross validation with 70-30 ratio maintaining the inter-patient separation scheme: 28 subjects in training and 12 in test set randomly kept 10 different times. The produced train-test sets within 10-Fold cross validation may be imbalanced [54]. For this reason and only when there was relevant imbalance between healthy and sick, it was decided to perform, before feature selection, a novel oversampling technique called LICIC [54].…”
Section: Classifiers and Experimental Setupmentioning
confidence: 99%
“…The produced train-test sets within 10-Fold cross validation may be imbalanced [54]. For this reason and only when there was relevant imbalance between healthy and sick, it was decided to perform, before feature selection, a novel oversampling technique called LICIC [54]. This oversampling technique creates new instances balancing the minority classes by preserving nonlinearities and the particular pattern present in each specific class.…”
Section: Classifiers and Experimental Setupmentioning
confidence: 99%
“…Dentamaro et al [18] propose a new oversampling technique called Less Important Components for Imbalanced Multiclass Classification-LICIC to cope with both class imbalance and the famous "curse of dimensionality" problem. The method enables preservation of non-linearities within the dataset, while creating new instances without adding noise.…”
Section: Contributionmentioning
confidence: 99%
“…24,26 Finally, support vector machines, random forest classifiers, and novel oversampling techniques have been implemented to improve the characterization of spectra from highly similar bacteria. 27 We sought to determine whether the Aristotle Classifier, which was developed for glycomics studies, could provide additional benefits to the field of bacterial identification. To test this hypothesis, we used nine different data sets containing MALDI-TOF MS data of bacterial proteins, where the all the members of a given data set originate from the same genus.…”
Section: Introductionmentioning
confidence: 99%
“…In many cases, machine learning methods have provided researchers better ability to discriminate bacterial samples than what would be gained by simply comparing user-generated spectra to a set of prototypes. In one case, the use of the XGBoost classifier was pivotal in enabling the identification of polymicrobial species based on MS data of their membrane glycolipids . In other examples, researchers have used a variety of machine learning strategies to discriminate samples of bacterial mixtures. , Finally, support vector machines, random forest classifiers, and novel oversampling techniques have been implemented to improve the characterization of spectra from highly similar bacteria …”
mentioning
confidence: 99%