2020
DOI: 10.21203/rs.3.rs-54646/v1
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

CatBoost for Big Data: an Interdisciplinary Review

Abstract: Gradient Boosted Decision Trees (GBDT's) are a powerful tool for classification and regression tasks in Big Data, Researchers should be familiar with the strengths and weaknesses of current implementations of GBDT's in order to use them effectively and make successful contributions. CatBoost is a member of the family of GBDT machine learning ensemble techniques. Since its debut in late 2018, researchers have ellCcessfully used CatBoost for machine learning studies involving Big Data. We take this opportunity t… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
1
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(6 citation statements)
references
References 43 publications
0
5
0
1
Order By: Relevance
“…It is a high-performance open source library for gradient boosting on decision tree. It gives outstanding output, and it performs quickly by the help of GPU, even on a large dataset [23].…”
Section: Catboostmentioning
confidence: 99%
See 1 more Smart Citation
“…It is a high-performance open source library for gradient boosting on decision tree. It gives outstanding output, and it performs quickly by the help of GPU, even on a large dataset [23].…”
Section: Catboostmentioning
confidence: 99%
“…Gradient boosting now takes an additive pattern that recursively builds a greedy sequence of interpolation F t of a given loss function L(y i , F t ). Assuming the function f (x),we can enhance estimate of y i by finding another function [23].…”
Section: Catboostmentioning
confidence: 99%
“…Второй метод, рассмотренный в статье -это метод машинного обучения на основе градиентного бустинга (Catboost), который позволяет создавать множество алгоритмов (деревьев решения), которые, в свою очередь, способны научиться принимать решения и строить прогнозы на основе данных (Brink et al, 2016;Hancock and Khoshgoftaar, 2020). Машинное обучение -сравнительно новый метод, при этом для демографической науки он редко применялся ранее (Соловьев и Соловьев, 2018, с.…”
Section: теоретические основы исследованияunclassified
“…• Given a set of selected features, recommend the ML algorithm(s) able to induce the best predictive model, which can be a set of algorithms, each one inducing a model, and combine these models into an ensemble (P ml ), recommending the best algorithm. Ensemble methods can boost the performance of simple classifiers (e.g., using multiple prediction models for solving the same problem) and have proven their effectiveness in bioinformatics (LIU et al, 2020;HANCOCK;KHOSHGOFTAAR, 2020;HE et al, 2022).…”
Section: Metalearningmentioning
confidence: 99%
“…We chose these ML algorithms because they have good predictive performance and induce interpretable predictive models, allowing the understanding of the internal decision-making process (BONIDIA et al, 2020a). The algorithms are widely adopted in the bioinformatics literature (LIU et al, 2020;HANCOCK;KHOSHGOFTAAR, 2020;HE et al, 2022).…”
Section: Bioautoml -Selection and Recommendationmentioning
confidence: 99%