2020
DOI: 10.1007/978-3-030-46150-8_35
|View full text |Cite
|
Sign up to set email alerts
|

Fast Gradient Boosting Decision Trees with Bit-Level Data Structures

Abstract: A gradient boosting decision tree model is a powerful machine learning method that iteratively constructs decision trees to form an additive ensemble model. The method uses the gradient of the loss function to improve the model at each iteration step. Inspired by the database literature, we exploit bitset and bitslice data structures in order to improve the run time efficiency of learning the trees. We can use these structures in two ways. First, they can represent the input data itself. Second, they can store… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
2
1

Relationship

2
4

Authors

Journals

citations
Cited by 11 publications
(14 citation statements)
references
References 10 publications
0
14
0
Order By: Relevance
“…The key difference is that LightGBM eliminates some instances with small gradients and combines correlated attributes, and CatBoost trains the so-called oblivious trees where an identical split point is used in the whole level of a tree. Recent studies exploit both value-level parallel and featurelevel parallelism at different stages of the training [Peng et al, 2019], and exploit bit-level optimization for GBDTs [Devos et al, 2019]. The authors showed that the proposed algorithms can outperform CPU-based XGBoost and LightGBM.…”
Section: Gbdt Training On Cpusmentioning
confidence: 99%
“…The key difference is that LightGBM eliminates some instances with small gradients and combines correlated attributes, and CatBoost trains the so-called oblivious trees where an identical split point is used in the whole level of a tree. Recent studies exploit both value-level parallel and featurelevel parallelism at different stages of the training [Peng et al, 2019], and exploit bit-level optimization for GBDTs [Devos et al, 2019]. The authors showed that the proposed algorithms can outperform CPU-based XGBoost and LightGBM.…”
Section: Gbdt Training On Cpusmentioning
confidence: 99%
“…Average and standard-deviation of clicks generated by every publisher where same fields (referredurl, deviceua, usercountry, numericip, campaignid) being duplicated in one minute 2 2 Average and standard-deviation of clicks generated by every publisher where same fields being duplicated in one minute during the hours 20:00-24:00:00 and 00:00:00-06:00:00 (night), 06:00:00-12:00:00 (morning), 12:00:00-17:00:00 (afternoon), 17:00:00-20:00:00 (evening) (Devos et al, 2019) is a machine learning approach that creates a predictive model in the form of an ensemble of weak prediction models. In supervised learning, considering the input variable be x; an output variable y can be defined through joint probability distribution Pðx; yÞ.…”
Section: Feature Engineeringmentioning
confidence: 99%
“…Can we find a two action sequence involving a backward pass in the middle of the field that results in a game state with a probability greater than 10% of scoring in the near future? 7 This is a relevant question as the soccer analytics community is interested in understanding the usefulness of backwards passes far away from the goal. Our method proved that pass-pass sequences in the midfield cannot have a probability greater than 10%.…”
Section: Querying the Modelmentioning
confidence: 99%
“…In this work, we focus on developing an approach that can answer such questions about additive tree ensembles, which includes random forests [3] and gradient boosting trees (e.g., [4,15,7]). This represents a powerful and widely-used family of machine learning algorithms.…”
Section: Introductionmentioning
confidence: 99%