2022
DOI: 10.1177/15501329221106935
|View full text |Cite
|
Sign up to set email alerts
|

Research and application of XGBoost in imbalanced data

Abstract: As a new and efficient ensemble learning algorithm, XGBoost has been widely applied for its multitudinous advantages, but its classification effect in the case of data imbalance is often not ideal. Aiming at this problem, an attempt was made to optimize the regularization term of XGBoost, and a classification algorithm based on mixed sampling and ensemble learning is proposed. The main idea is to combine SVM-SMOTE over-sampling and EasyEnsemble under-sampling technologies for data processing, and then obtain t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
27
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 66 publications
(27 citation statements)
references
References 11 publications
0
27
0
Order By: Relevance
“…Thus, models like SVR and XGBoost might not be as effective in modeling patterns without extensive feature engineering 56,57 .…”
Section: Discussionmentioning
confidence: 99%
“…Thus, models like SVR and XGBoost might not be as effective in modeling patterns without extensive feature engineering 56,57 .…”
Section: Discussionmentioning
confidence: 99%
“…These nonlinear relationships can be explored with ML algorithms, especially those that are capable of detecting complicated interactions, such as random forest and XGBoost. 189,190 Furthermore, sensitivity analysis can aid in determining the generalizability and durability of an ML model. Researchers can determine the stability and reliability of the predictions by examining the performance of the model under different scenarios and by perturbing input variables.…”
Section: Sensitivity Of Machine Learning Models To Input Variablesmentioning
confidence: 99%
“…Some features of the biomass may interact synergistically, resulting in nonlinear effects on the synthesis of the product. These nonlinear relationships can be explored with ML algorithms, especially those that are capable of detecting complicated interactions, such as random forest and XGBoost 189,190 . Furthermore, sensitivity analysis can aid in determining the generalizability and durability of an ML model.…”
Section: Feature Importance and Sensitivity Analysismentioning
confidence: 99%
“…Construction of Training, Validation, and Test Data Sets. Common approaches to address imbalanced data include undersampling, 24 oversampling, 25 ensemble methods, 26 etc. Undersampling techniques may cause information loss, and oversampling may lead to overfitting.…”
Section: Benchmark Data Setsmentioning
confidence: 99%