This paper proposes a novel Integrated Oversampling (INOS) method that can handle highly imbalanced time series classification. We introduce an enhanced structure preserving oversampling (ESPO) technique and synergistically combine it with interpolation-based oversampling. ESPO is used to generate a large percentage of the synthetic minority samples based on multivariate Gaussian distribution, by estimating the covariance structure of the minority-class samples and by regularizing the unreliable eigen spectrum. To protect the key original minority samples, we use an interpolation-based technique to oversample a small percentage of synthetic population. By preserving the main covariance structure and intelligently creating protective variances in the trivial eigen dimensions, ESPO effectively expands the synthetic samples into the void area in the data space without being too closely tied with existing minority-class samples. This also addresses a key challenge for applying oversampling for imbalanced time series classification, i.e., maintaining the correlation between consecutive values through preserving the main covariance structure. Extensive experiments based on seven public time series data sets demonstrate that our INOS approach, used with support vector machines (SVM), achieved better performance over existing oversampling methods as well as state-ofthe-art methods in time series classification.
In a mature manufacturing system, the occurrence of operating fault conditions is few and far between. Majority of the data collected from such systems typically exhibits normal operating behaviours. This phenomenon inadvertently creates an imbalance between the class distributions of the data. The imbalance ratio may fall in the range of 1:100 to 1:1000 for every fault condition data available. The nature of such datasets thus makes it harder to build reliable models for accurate fault diagnosis in Condition-Based Maintenance (CBM) due to the lack of learning exemplars of the fault class. Conventional machine learning algorithms do not handle imbalanced datasets well and generally would produce poor classification results. To improve the fault diagnosis reliability on class-imbalanced datasets, this paper proposes a hybrid rebalancing approach called Hybrid Support Vector Machine (SVM) under sampling with Mega Trend Diffusion (MTD) oversampling. Our proposed approach rebalances the dataset by (1) Reducing the amount of normal condition data whilst retaining the most informative ones and (2) Boosting the number of fault condition data to match the size of the normal data. This approach is highly applicable to the manufacturing setting as there is a level of predictability to the nature of data, i.e. data of different fault conditions tend to cluster together in the feature space. Thus, manipulating the data at this level is a logical step. As such, learning effectively with the limited available fault data can translate to significantly costsaving. Our approach is demonstrated and validated with a case study on bearing fault detection. To end, some conclusions and future works are discussed.
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.