Comparison of Naïve Bayes Algorithm and XGBoost on Local Product Review Text Classification

Hendrawan, Ivan Rifky; Utami, Ema; Hartanto, Anggit Dwi

doi:10.29408/edumatic.v6i1.5613

Cited by 8 publications

(8 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The XGBoost approach was found to be more effective in solving classification problems by Hendrawan et al (2022) in their study on e-commerce product reviews. Elmitwally (2020) claimed in his research that he used XGBoost to accomplish the classification problem’s highest prediction performance.…”

Section: Methodsmentioning

confidence: 99%

An analysis of annual reports from the sustainable development goals perspective

Hacıhasanoğlu,

Ünlüsoy,

Madenoğlu

2023

View full text Add to dashboard Cite

Purpose The sustainable development goals (SDGs) are introduced to guide achieving the sustainable goals and tackle the global problems. United Nations members may perform activities to achieve the predetermined goals and report on their SDG activities. The comprehension and commitment of several stakeholders are essential for the effective implementation of the SDGs. Countries encourage their stakeholders to perform and report their activities to meet the SDGs. The purpose of this study is to investigate the extent to which corporations’ annual reports address the SDGs to assess and comprehend their level of commitment to, priority of and integration of SDGs within their reporting structure. This research makes it easier to evaluate corporations’ sustainability performance and contributions to global sustainability goals by looking at the extent to which they address the SDGs. Design/methodology/approach In the study, it is revealed to what extent the reports meet the SDGs with the multilabel text classification approach. The SDG classification is carried out by examining the report with the help of a text analysis tool based on an enhanced version of gradient boosting. The implementation of a machine learning-based model allowed it to determine which SDGs are associated with the company’s operations without the requirement for the report’s authors to perform so. Therefore, instead of reading the texts to seek for “SDG” evidence as typically occurs in the literature, SDG proof was searched in relevant texts. Findings To show the feasibility of the study, the annual reports of the leading companies in Turkey are examined, and the results are interpreted. The study produced results including insights into the sustainable practices of businesses, priority SDG selection, benchmarking and business comparison, gaps and improvement opportunities identification and representation of the SDGs’ importance. Originality/value The findings of the analysis of annual reports indicate which SDGs they are concerned about. A gap in the literature can be noticed in the analysis of annual reports of companies that fall under a particular framework. In addition, it has sparked the idea of conducting research on a global scale and in a time series. With the aid of this research, decision-making procedures can be guided, and advancements toward the SDGs can be achieved.

show abstract

Section: Methodsmentioning

confidence: 99%

An analysis of annual reports from the sustainable development goals perspective

Hacıhasanoğlu,

Ünlüsoy,

Madenoğlu

2023

View full text Add to dashboard Cite

show abstract

“…It is particularly suitable for situations where the data distribution is not explicitly known or may exhibit non-standard characteristics. XGBoost employs a boosting technique to improve model performance sequentially by correcting errors ( Hendrawan et al, 2022 ; Arif Ali et al, 2023 ). It has robustness in handling linear and non-linear relationships, including missing data ( Hendrawan et al, 2022 ; Arif Ali et al, 2023 ).…”

Section: Methodsmentioning

confidence: 99%

“…XGBoost employs a boosting technique to improve model performance sequentially by correcting errors ( Hendrawan et al, 2022 ; Arif Ali et al, 2023 ). It has robustness in handling linear and non-linear relationships, including missing data ( Hendrawan et al, 2022 ; Arif Ali et al, 2023 ). RF combines multiple decision trees to improve overall prediction accuracy and reduce overfitting ( Belgiu and Drăguţ, 2016 ).…”

Section: Methodsmentioning

confidence: 99%

Machine learning approach to evaluate TdP risk of drugs using cardiac electrophysiological model including inter-individual variability

Fuadah,

Qauli,

Marcellinus

et al. 2023

Front. Physiol.

View full text Add to dashboard Cite

Introduction: Predicting ventricular arrhythmia Torsade de Pointes (TdP) caused by drug-induced cardiotoxicity is essential in drug development. Several studies used single biomarkers such as qNet and Repolarization Abnormality (RA) in a single cardiac cell model to evaluate TdP risk. However, a single biomarker may not encompass the full range of factors contributing to TdP risk, leading to divergent TdP risk prediction outcomes, mainly when evaluated using unseen data. We addressed this issue by utilizing multi-in silico features from a population of human ventricular cell models that could capture a representation of the underlying mechanisms contributing to TdP risk to provide a more reliable assessment of drug-induced cardiotoxicity.Method: We generated a virtual population of human ventricular cell models using a modified O’Hara-Rudy model, allowing inter-individual variation. IC50 and Hill coefficients from 67 drugs were used as input to simulate drug effects on cardiac cells. Fourteen features (dVmdtrepol, dVmdtmax, Vmpeak, Vmresting, APDtri, APD90, APD50, Capeak, Cadiastole, Catri, CaD90, CaD50, qNet, qInward) could be generated from the simulation and used as input to several machine learning models, including k-nearest neighbor (KNN), Random Forest (RF), XGBoost, and Artificial Neural Networks (ANN). Optimization of the machine learning model was performed using a grid search to select the best parameter of the proposed model. We applied five-fold cross-validation while training the model with 42 drugs and evaluated the model’s performance with test data from 25 drugs.Result: The proposed ANN model showed the highest performance in predicting the TdP risk of drugs by providing an accuracy of 0.923 (0.908–0.937), sensitivity of 0.926 (0.909–0.942), specificity of 0.921 (0.906–0.935), and AUC score of 0.964 (0.954–0.975).Discussion and conclusion: According to the performance results, combining the electrophysiological model including inter-individual variation and optimization of machine learning showed good generalization ability when evaluated using the unseen dataset and produced a reliable drug-induced TdP risk prediction system.

show abstract

“…Future Proofing: Even if the XGBRFClassifier does not significantly outperform Naïve Bayes on our current dataset, it may be more adaptable to future changes in data distribution or feature sets [116]. Naïve Bayes is relatively simple and may not handle data shifts or feature additions as gracefully as the XGBRFClassifier [117].…”

mentioning

confidence: 96%

Land Cover and Landscape Structural Changes Using Extreme Gradient Boosting Random Forest and Fragmentation Analysis

Matyukira,

Mhangara

2023

Remote Sensing

View full text Add to dashboard Cite

Land use and land cover change constitute a significant driver of land degradation worldwide, and machine-learning algorithms are providing new opportunities for effectively classifying land use and land cover changes over time. The aims of this study are threefold: Firstly, we aim to compare the accuracies of the parametric classifier Naïve Bayes with the non-parametric classifier Extreme Gradient Boosting Random Forest algorithm on the 2020 LULC dataset. Secondly, we quantify land use and land cover changes in the Cradle of Humankind from 1990 to 2020 using the Extreme Gradient Boosting Random Forest algorithm and post-classification change detection. Thirdly, the study uses landscape metrics to examine landscape structural changes occurring in the same area due to fragmentation. The classification results show that while Naïve Bayers and XGB Random Forest produce classification results of high accuracy, the XGB Random Forest Classifier produced superior results compared to the Naïve Bayers Classifier. From 1990 to 2020, bare ground/rock outcrop significantly increased by 39%, and open bush by 32%. Indigenous forests and natural grasslands lost area (26% and 12%, respectively). The results from this study indicate increasing land cover fragmentation and attest to land degradation, as shown by increases in bare ground and a reduction in indigenous forest and natural grassland. The decline in indigenous forests and natural grassland indicates the degradation of native vegetation, considered as prehistoric plant food sources. The high classification results also attest to the efficacy of the XGBRFClassifier executed in GEE. Land degradation evident in the nature reserve has long-term ecological consequences, such as loss of habitat, biodiversity decline, soil erosion, and alteration of local ecosystems, which together diminish the aesthetic value of the heritage site and negatively impact its tourism value. Consequently, it destroys crucial local economies and threatens sustainable tourism.

show abstract

Comparison of Naïve Bayes Algorithm and XGBoost on Local Product Review Text Classification

Cited by 8 publications

References 10 publications

An analysis of annual reports from the sustainable development goals perspective

An analysis of annual reports from the sustainable development goals perspective

Machine learning approach to evaluate TdP risk of drugs using cardiac electrophysiological model including inter-individual variability

Land Cover and Landscape Structural Changes Using Extreme Gradient Boosting Random Forest and Fragmentation Analysis

Contact Info

Product

Resources

About