Optimizing the early glaucoma detection from visual fields by combining preprocessing techniques and ensemble classifier with selection strategies

Tékouabou, Stéphane Cédric Koumétio; Alaoui, El Arbi Abdellaoui; Chabbar, Imane; Toulni, Hamza; Cherif, Walid; Satori, Hassan

doi:10.1016/j.eswa.2021.115975

Cited by 15 publications

(12 citation statements)

References 60 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In their work, the SMOTE algorithm proved to be the most relevant for balancing the data. Concerning random under-sampling of the majority class, previous authors have shown that this is not desirable, especially in combination with the ensemble method, as it can lead to data information loss [21]. SMOTE's strategy is to create an artificial instance of a minority class through the following operating process: Considering an instance x i of the minority class, the algorithm starts by creating a new artificial instance from x i by first separating the k nearest neighbors to x i , from the minority class.…”

Section: Smote Methods For Data Balancingmentioning

confidence: 99%

“…SMOTE's strategy is to create an artificial instance of a minority class through the following operating process: Considering an instance x i of the minority class, the algorithm starts by creating a new artificial instance from x i by first separating the k nearest neighbors to x i , from the minority class. Then, randomly choose a neighbor and finally generate a synthetic example on the fictive line joining x i and the selected neighbor [21,40,44,47]. This process is clearly described by Algorithm 1.…”

Section: Smote Methods For Data Balancingmentioning

confidence: 99%

“…Ensemble-based methods consist of a combination of several independent basic classifiers that are in most cases decision trees (DT) but can also be artificial neural networks (ANN) or support vector machine (SVM), k-nearest neighbours (k-NN) or naive Bayes (NB) [21]. Each of these independent weak learners provides an alternative prediction of the whole problem and the final prediction results in a combination (usually by weighted or unweighted vote) of these alternative predictions [50].…”

Section: Modelling and Prediction With Ensemble Methodsmentioning

confidence: 99%

“…A concrete example remains its best performance at the Netflix challenge [39] which has made ensemble methods very famous and highly recommended for the scientific community. Several research contributions have been done to improve the ensemble methods which currently remain a major challenger of deep learning which is most used [21]. Recently, many studies [40][41][42][43] have shown that the combination of ensemble models with preprocessing techniques improves performance in modeling of the unbalanced classification problem.…”

Section: The Proposed Approachmentioning

confidence: 99%

“…Standardization is an additional step in dealing with this problem but it has an adverse or favorable effect on the performance of certain algorithms that are said to be unstable to variable scales; knowing that a very large scale difference is more favourable for overfitting. Choosing ensemble methods overcomes this problem without the need to normalize the data [21]. The second contribution of this paper is to use the synthetic minority oversampling technique (SMOTE) method to balance the data while the third one is the building of stable and better performing models based on an ensemble approach.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Towards Explainable Machine Learning for Bank Churn Prediction Using Data Balancing and Ensemble-Based Methods

et al. 2022

Self Cite

View full text Add to dashboard Cite

The diversity of data collected on both social networks and digital interfaces is extremely increased, raising the problem of heterogeneous variables that are not often favourable to classification algorithms. Despite the significant improvement in machine learning (ML) and predictive analysis efficiency for classification in customer relationship management systems (CRM), their performance remains very limited by heterogeneous data processing, class imbalance, and feature scales. This impact turned out to be more important for simple ML methods which in addition often suffer from over-fitting. This paper proposes a succinct and detailed ML model building process including cross-validation of the combination of SMOTE to balance data and ensemble methods for modelling. From the conducted experiments, the random forest (RF) model yielded the best performance of 0.86 in terms of accuracy and f1-scoreusing balanced data. It confirms the literature summary about this topic which shows that RF was among the most effective algorithms for customer predictive classification issues. The constructed and optimized models were interpreted by Shapley values and feature importance analysis which shows that the “age” feature was the most significant while “HasCrCard” was the less one. This process has proven effective in bridging previously reported research gaps and the resulting model should be used for supporting bank customer loyalty decision-making.

show abstract

Section: Smote Methods For Data Balancingmentioning

confidence: 99%