2014
DOI: 10.1016/j.neuroimage.2013.10.005
|View full text |Cite
|
Sign up to set email alerts
|

Analysis of sampling techniques for imbalanced data: An n = 648 ADNI study

Abstract: Many neuroimaging applications deal with imbalanced imaging data. For example, in Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset, the mild cognitive impairment (MCI) cases eligible for the study are nearly two times the Alzheimer’s disease (AD) patients for structural magnetic resonance imaging (MRI) modality and six times the control cases for proteomics modality. Constructing an accurate classifier from imbalanced data is a challenging task. Traditional classifiers that aim to maximize the overal… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
82
0
2

Year Published

2014
2014
2020
2020

Publication Types

Select...
7
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 168 publications
(85 citation statements)
references
References 60 publications
1
82
0
2
Order By: Relevance
“…A typical machine learning algorithm trained using an imbalanced data set assigns new observations to the majority class (e.g. suicide non-attempters) (Dubey et al, 2014). In this study, the class imbalance problem was circumvented by ‘under-sampling’ the majority class (suicide non-attempters) followed by training an algorithm with a balanced sample – a process which was repeated until all observations in the majority class were selected at least once and predictions aggregated as shown in Figure 1.…”
Section: Methodsmentioning
confidence: 99%
“…A typical machine learning algorithm trained using an imbalanced data set assigns new observations to the majority class (e.g. suicide non-attempters) (Dubey et al, 2014). In this study, the class imbalance problem was circumvented by ‘under-sampling’ the majority class (suicide non-attempters) followed by training an algorithm with a balanced sample – a process which was repeated until all observations in the majority class were selected at least once and predictions aggregated as shown in Figure 1.…”
Section: Methodsmentioning
confidence: 99%
“…Downsampling works by randomly selecting equal numbers of the minority class as the majority class for training, thus creating a more balanced dataset for training in each fold. One issue with downsampling is that it could lead to loss of potentially important information (35). On the other hand, SMOTE algorithm multiplies the vector between K neighboring samples of a sample by a random value between 0 and 1, and adds the result to that sample, creating synthetic samples.…”
Section: Methodsmentioning
confidence: 99%
“…Moreover, beyond classification efficiency reasons and for clinical considerations, it could also be interesting to know the most impacted brain regions. In this work, we selected the most relevant anatomical sub-ensembles by using SLR with L 1 /L 2 -norm regularization [15,16]. It has been established that combining the two norms take into account possible inter-feature correlation while imposing sparsity [17].…”
Section: Anatomical Sub-ensemble Selection and Weightingmentioning
confidence: 99%