Boosting Prediction Accuracy on Imbalanced Datasets with SVM Ensembles

Yang, Liu; An, Aijun; Huang, Xiangji

doi:10.1007/11731139_15

Cited by 114 publications

(58 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…SVM is a widely used machine learning method which has been applied to many real world problems providing satisfactory results. SVM works effectively with balanced dataset but provides suboptimal classification models considering the imbalanced dataset; several examples demonstrate this conclusion (Veropoulus et al, 1999;Akbani et al, 2004;Wu & Chang, 2003;Wu & Chang, 2005;Raskutti & Kowalczyk, 2004;Imam et al, 2006;Zou et al, 2008;Lin et al, 2009;Kang & Cho, 2006;Liu et al, 2006;Haibo & Garcia, 2009). SVM is biased toward the majority class and provides poor results concerning the minority class.…”

Section: Fuzzy Based Approachesmentioning

confidence: 98%

Fuzzy Inference System for Data Processing in Industrial Applications

Cateni¹,

Colla²

2012

Fuzzy Inference System - Theory and Applications

View full text Add to dashboard Cite

Section: Fuzzy Based Approachesmentioning

confidence: 98%

Fuzzy Inference System for Data Processing in Industrial Applications

Cateni¹,

Colla²

2012

Fuzzy Inference System - Theory and Applications

View full text Add to dashboard Cite

“…Batista et al [3] proposed to apply SMOTE after performing a data cleaning (i.e., under-sampling) method such as Tomek links and the Wilson's Edited Nearest Neighbor Rule. Liu at al [14] over-sampled the minority class with SMOTE to some extent, then under-sampled the majority class a number of times to create bootstrap samples having the same or similar size with the over-sampled minority class.…”

Section: Related Workmentioning

confidence: 99%

Borderline over-sampling for imbalanced data classification

Nguyen

Cooper

Kamei

2011

IJKESDP

459

206

View full text Add to dashboard Cite

Abstract-Traditional classification algorithms, in many times, perform poorly on imbalanced data sets in which some classes are heavily outnumbered by the remaining classes. For this kind of data, minority class instances, which are usually much more of interest, are often misclassified. The paper proposes a method to deal with them by changing class distribution through oversampling at the borderline between the minority class and the majority class of the data set. A Support Vector Machines (SVMs) classifier then is trained to predict new unknown instances. Compared to other over-sampling methods, the proposed method focuses only on the minority class instances lying around the borderline due to the fact that this area is most crucial for establishing the decision boundary. Furthermore, new instances will be generated in such a manner that minority class area will be expanded further toward the side of the majority class at the places where there appear few majority class instances. Experimental results show that the proposed method can achieve better performance than some other over-sampling methods, especially with data sets having low degree of overlap due to its ability of expanding minority class area in such cases.

show abstract

“…We trained two classifiers using the R language for statistical computing (Liu et al 2006): an ensemble of Support Vector Machines (EnsSVM), and a weighted random forest (WRF). Two additional classifiers, namely, a (regular) SVM with an RBF kernel and a decision tree with a weighted loss function, were trained on the route blockage data set, but EnsSVM and WRF outperformed them in terms of maximizing recall.…”

Section: Ensemble Classifiersmentioning

confidence: 99%

Identification of Robust Terminal-Area Routes in Convective Weather

Pfeil

Balakrishnan

2012

Transportation Science

View full text Add to dashboard Cite

Convective weather is responsible for large delays and widespread disruptions in the U.S. National Airspace System, especially during summer. Traffic Flow Management algorithms require reliable forecasts of route blockage to schedule and route traffic. This paper demonstrates how raw convective weather forecasts, which provide deterministic predictions of the Vertically Integrated Liquid (the precipitation content in a column of airspace) can be translated into probabilistic forecasts of whether or not a terminal-area route will be blocked.Given a flight route through the terminal-area, we apply techniques from machine learning to determine the likelihood that the route will be open in actual weather. The likelihood is then used to optimize terminalarea operations, by dynamically moving arrival and departure routes to maximize the expected capacity of the terminal area. Experiments using real weather scenarios on stormy days show that our algorithms recommend that a terminal-area route be modified 30% of the time, opening up 13% more available routes that were forecast blocked during these scenarios. The error rate is low, with only 5% of cases corresponding to a modified route being blocked in reality, while the original route is in fact open. In addition, for routes predicted to be open with probability 0.95 or greater by our method, 96% of these routes (on average over time horizon) are indeed open in the weather that materializes.

show abstract

Boosting Prediction Accuracy on Imbalanced Datasets with SVM Ensembles

Cited by 114 publications

References 11 publications

Fuzzy Inference System for Data Processing in Industrial Applications

Fuzzy Inference System for Data Processing in Industrial Applications

Borderline over-sampling for imbalanced data classification

Identification of Robust Terminal-Area Routes in Convective Weather

Contact Info

Product

Resources

About