CatBoost for Big Data: an Interdisciplinary Review

Hancock, John; Khoshgoftaar, Taghi M.

doi:10.21203/rs.3.rs-54646/v1

Cited by 7 publications

(6 citation statements)

References 43 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…It is a high-performance open source library for gradient boosting on decision tree. It gives outstanding output, and it performs quickly by the help of GPU, even on a large dataset [23].…”

Section: Catboostmentioning

confidence: 99%

See 1 more Smart Citation

Analysis of Ensembled Learning using Classical and Quantum Boosting Approaches for Diabetes Mellitus Prediction

Jha,

Adhikari

2024

Preprint

View full text Add to dashboard Cite

The paper focused on the implementation of Quantum machine learning and artificial intelligence techniques in diagnosing diseases, specifically focusing on diabetes. The paper proposed an ensemble approach that combined classical algorithms with quantum processing unit (QPU)--based algorithms to improve the performance of a model. The diabetes dataset used in the study is obtained from the (Centre for Disease Control and Prevention (CDC) repository, and the goal is to classify patients as either diabetic or non-diabetic. The ensemble algorithms examined in the study include Voting classifier, Adaboost, Xgboost, Catboost, and QPU-based Qboost. While Qboost demonstrates some quantum speedup, its performance is not satisfactory. Therefore, the proposed hybrid model is developed to enhance the performance metrics. The hybrid model achieves an average accuracy, precision, recall, f1 score, and AUC score of 0.89, 0.85, 0.95, 0.90, and 0.96, respectively, on the diabetes dataset. In comparison, the top-performing Adaboost algorithm achieves an average accuracy, precision, recall, f1 score, and AUC score of 0.94, 0.91, 0.98, 0.94, and 0.98, respectively. The paper concludes that while quantum computing (QC) significantly improves computation speed, it comes at a slight cost of a 5 \% decrease in classification metrics and 0.186 in the AUC score. Additionally, the study suggests that further development of Quantum computing hardware will enhance overall performance metrics.

show abstract

“…It is a high-performance open source library for gradient boosting on decision tree. It gives outstanding output, and it performs quickly by the help of GPU, even on a large dataset [23].…”

Section: Catboostmentioning

confidence: 99%

“…Gradient boosting now takes an additive pattern that recursively builds a greedy sequence of interpolation F t of a given loss function L(y i , F t ). Assuming the function f (x),we can enhance estimate of y i by finding another function [23].…”

Section: Catboostmentioning

confidence: 99%

Analysis of Ensembled Learning using Classical and Quantum Boosting Approaches for Diabetes Mellitus Prediction

Jha,

Adhikari

2024

Preprint

View full text Add to dashboard Cite

show abstract

“…Второй метод, рассмотренный в статье -это метод машинного обучения на основе градиентного бустинга (Catboost), который позволяет создавать множество алгоритмов (деревьев решения), которые, в свою очередь, способны научиться принимать решения и строить прогнозы на основе данных (Brink et al, 2016;Hancock and Khoshgoftaar, 2020). Машинное обучение -сравнительно новый метод, при этом для демографической науки он редко применялся ранее (Соловьев и Соловьев, 2018, с.…”

Section: теоретические основы исследованияunclassified

Forecasting fertility demographic indicators: inertial method versus machine learning method

Zubarev¹,

Fedulova²

2021

Ars Administrandi

View full text Add to dashboard Cite

Введение: статья посвящена сравнению методов прогнозирования демографических показателей в сфере рождаемости населения. Цель: определить точность методов прогнозирования демографических показателей в сфере рождаемости населения на основании сравнения прогнозных значений, полученных с помощью инерционного метода и метода машинного обучения (на данных Пермского края). Методы: статистический анализ, графический анализ, инерционный метод прогнозирования, метод машинного обучения на основе градиентного бустинга (Catboost) с использованием программной среды Google Collab и языка программирования Python версии 3.7. Результаты: получены прогнозные значения показателя «абсолютная численность родившихся». Средняя ошибка отклонения прогнозных значений от фактических для метода инерционного прогнозирования составила 11,9 %, а для метода машинного обучения -19,85 %. Выявлены особенности формирования прогнозных значений для каждого метода и обоснованы высокие значения отклонений. Выводы: метод инерционного прогнозирования оказался более точным, чем метод

show abstract

“…• Given a set of selected features, recommend the ML algorithm(s) able to induce the best predictive model, which can be a set of algorithms, each one inducing a model, and combine these models into an ensemble (P ml ), recommending the best algorithm. Ensemble methods can boost the performance of simple classifiers (e.g., using multiple prediction models for solving the same problem) and have proven their effectiveness in bioinformatics (LIU et al, 2020;HANCOCK;KHOSHGOFTAAR, 2020;HE et al, 2022).…”

Section: Metalearningmentioning

confidence: 99%

“…We chose these ML algorithms because they have good predictive performance and induce interpretable predictive models, allowing the understanding of the internal decision-making process (BONIDIA et al, 2020a). The algorithms are widely adopted in the bioinformatics literature (LIU et al, 2020;HANCOCK;KHOSHGOFTAAR, 2020;HE et al, 2022).…”

Section: Bioautoml -Selection and Recommendationmentioning

confidence: 99%

BioAutoML: Democratizing Machine Learning in Life Sciences

Bonidia

View full text Add to dashboard Cite

show abstract

CatBoost for Big Data: an Interdisciplinary Review

Cited by 7 publications

References 43 publications

Analysis of Ensembled Learning using Classical and Quantum Boosting Approaches for Diabetes Mellitus Prediction

Analysis of Ensembled Learning using Classical and Quantum Boosting Approaches for Diabetes Mellitus Prediction

Forecasting fertility demographic indicators: inertial method versus machine learning method

BioAutoML: Democratizing Machine Learning in Life Sciences

Contact Info

Product

Resources

About