2011 IEEE 11th International Conference on Data Mining 2011
DOI: 10.1109/icdm.2011.137
|View full text |Cite
|
Sign up to set email alerts
|

SPO: Structure Preserving Oversampling for Imbalanced Time Series Classification

Abstract: This paper presents a novel structure preserving oversampling (SPO) technique for classifying imbalanced time series data. SPO generates synthetic minority samples based on multivariate Gaussian distribution by estimating the covariance structure of the minority class and regularizing the unreliable eigen spectrum. By preserving the main covariance structure and intelligently creating protective variances in the trivial eigen feature dimensions, the synthetic samples expand effectively into the void area in th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
33
0

Year Published

2012
2012
2021
2021

Publication Types

Select...
5
2

Relationship

2
5

Authors

Journals

citations
Cited by 40 publications
(33 citation statements)
references
References 21 publications
0
33
0
Order By: Relevance
“…Compared with our previous preliminary SPO work [26], the current proposed INOS method differs and improves in the following aspects: 1) The oversampling is performed in the signal space with improved efficiency and no risk of artificially introducing variances in the common null space.…”
Section: Our Solution Via Structure Preservationmentioning
confidence: 95%
See 1 more Smart Citation
“…Compared with our previous preliminary SPO work [26], the current proposed INOS method differs and improves in the following aspects: 1) The oversampling is performed in the signal space with improved efficiency and no risk of artificially introducing variances in the common null space.…”
Section: Our Solution Via Structure Preservationmentioning
confidence: 95%
“…As tabulated in Table 7, using our most populous time series data set, Wafer, our current MATLAB implementation takes an average of 2:1 Â 10 À2 and 1:5 Â 10 À2 second for SPO [26] and our proposed INOS, respectively, to create a synthetic sample of 152 dimensions using an ordinary computer with 2.79-GHz CPU. For Yoga, which has the longest time series length, it took 1:7 Â 10 À1 and 5:0 Â 10 À2 second, respectively, for SPO and our INOS to create a sample of 426 dimensions.…”
Section: Computation Efficiencymentioning
confidence: 99%
“…Motivated by the data paucity and multimodality in the minority class (e.g., there may be two distinct failure modes for aircrafts), this paper proposes an oversampling method based on a parsimonious mixture of Gaussian trees model for imbalanced time-series classification. Such a model is shown to: 1) compare excellently with other state-of-the-art methods [1], [3] in terms of classification accuracy; 2) require the estimation of far fewer parameters compared with the existing methods; 3) model multimodal minority classes using mixture distributions; 4) model the dependencies between the points in a time-series explicitly; and 5) have a relatively low computational complexity. Existing techniques for class imbalance can be divided into two categories: those at the algorithm-level [4]- [6] and those at the data-level [1], [7]- [16].…”
mentioning
confidence: 96%
“…Such a model is shown to: 1) compare excellently with other state-of-the-art methods [1], [3] in terms of classification accuracy; 2) require the estimation of far fewer parameters compared with the existing methods; 3) model multimodal minority classes using mixture distributions; 4) model the dependencies between the points in a time-series explicitly; and 5) have a relatively low computational complexity. Existing techniques for class imbalance can be divided into two categories: those at the algorithm-level [4]- [6] and those at the data-level [1], [7]- [16]. Algorithm-level approaches correct the class imbalance by incorporating a predefined cost assigned to each class [4].…”
mentioning
confidence: 96%
See 1 more Smart Citation