Fast Gradient Boosting Decision Trees with Bit-Level Data Structures

Devos, Laurens; Meert, Wannes; Davis, Jesse

doi:10.1007/978-3-030-46150-8_35

Cited by 11 publications

(14 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The key difference is that LightGBM eliminates some instances with small gradients and combines correlated attributes, and CatBoost trains the so-called oblivious trees where an identical split point is used in the whole level of a tree. Recent studies exploit both value-level parallel and featurelevel parallelism at different stages of the training [Peng et al, 2019], and exploit bit-level optimization for GBDTs [Devos et al, 2019]. The authors showed that the proposed algorithms can outperform CPU-based XGBoost and LightGBM.…”

Section: Gbdt Training On Cpusmentioning

confidence: 99%

Challenges and Opportunities of Building Fast GBDT Systems

Wen

et al. 2021

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence

View full text Add to dashboard Cite

In the last few years, Gradient Boosting Decision Trees (GBDTs) have been widely used in various applications such as online advertising and spam filtering. However, GBDT training is often a key performance bottleneck for such data science pipelines, especially for training a large number of deep trees on large data sets. Thus, many parallel and distributed GBDT systems have been researched and developed to accelerate the training process. In this survey paper, we review the recent GBDT systems with respect to accelerations with emerging hardware as well as cluster computing, and compare the advantages and disadvantages of the existing implementations. Finally, we present the research opportunities and challenges in designing fast next generation GBDT systems.

show abstract

Section: Gbdt Training On Cpusmentioning

confidence: 99%

Challenges and Opportunities of Building Fast GBDT Systems

Wen

et al. 2021

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence

View full text Add to dashboard Cite

show abstract

“…Average and standard-deviation of clicks generated by every publisher where same fields (referredurl, deviceua, usercountry, numericip, campaignid) being duplicated in one minute 2 2 Average and standard-deviation of clicks generated by every publisher where same fields being duplicated in one minute during the hours 20:00-24:00:00 and 00:00:00-06:00:00 (night), 06:00:00-12:00:00 (morning), 12:00:00-17:00:00 (afternoon), 17:00:00-20:00:00 (evening) (Devos et al, 2019) is a machine learning approach that creates a predictive model in the form of an ensemble of weak prediction models. In supervised learning, considering the input variable be x; an output variable y can be defined through joint probability distribution Pðx; yÞ.…”

Section: Feature Engineeringmentioning

confidence: 99%

Gradient boosting learning for fraudulent publisher detection in online advertising

Sisodia

2020

DTA

View full text Add to dashboard Cite

PurposeAnalysis of the publisher's behavior plays a vital role in identifying fraudulent publishers in the pay-per-click model of online advertising. However, the vast amount of raw user click data with missing values pose a challenge in analyzing the conduct of publishers. The presence of high cardinality in categorical attributes with multiple possible values has further aggrieved the issue.Design/methodology/approachIn this paper, gradient tree boosting (GTB) learning is used to address the challenges encountered in learning the publishers' behavior from raw user click data and effectively classifying fraudulent publishers.FindingsThe results demonstrate that the GTB effectively classified fraudulent publishers and exhibited significantly improved performance as compared to other learning methods in terms of average precision (60.5 %), recall (57.8 %) and f-measure (59.1%).Originality/valueThe experiments were conducted using publicly available multiclass raw user click dataset and eight other imbalanced datasets to test the GTB's generalizing behavior, while training and testing were done using 10-fold cross-validation. The performance of GTB was evaluated using average precision, recall and f-measure. The performance of GTB learning was also compared with eleven other state-of-the-art individual and ensemble classification models.

show abstract

“…Can we find a two action sequence involving a backward pass in the middle of the field that results in a game state with a probability greater than 10% of scoring in the near future? 7 This is a relevant question as the soccer analytics community is interested in understanding the usefulness of backwards passes far away from the goal. Our method proved that pass-pass sequences in the midfield cannot have a probability greater than 10%.…”

Section: Querying the Modelmentioning

confidence: 99%

“…In this work, we focus on developing an approach that can answer such questions about additive tree ensembles, which includes random forests [3] and gradient boosting trees (e.g., [4,15,7]). This represents a powerful and widely-used family of machine learning algorithms.…”

Section: Introductionmentioning

confidence: 99%

Verifying Tree Ensembles by Reasoning about Potential Instances

Devos,

Meert,

Davis

2020

Preprint

Self Cite

View full text Add to dashboard Cite

Imagine being able to ask questions to a black box model such as "Which adversarial examples exist?", "Does a specific attribute have a disproportionate effect on the model's prediction?" or "What kind of predictions are possible for a partially described example?" This last question is particularly important if your partial description does not correspond to any observed example in your data, as it provides insight into how the model will extrapolate to unseen data. These capabilities would be extremely helpful as it would allow a user to better understand the model's behavior, particularly as it relates to issues such as robustness, fairness, and bias. In this paper, we propose such an approach for an ensemble of trees. Since, in general, this task is intractable we present a strategy that (1) can prune part of the input space given the question asked to simplify the problem; and (2) follows a divide and conquer approach that is incremental and can always return some answers and indicates which parts of the input domains are still uncertain. The usefulness of our approach is shown on a diverse set of use cases.

show abstract

Fast Gradient Boosting Decision Trees with Bit-Level Data Structures

Cited by 11 publications

References 10 publications

Challenges and Opportunities of Building Fast GBDT Systems

Challenges and Opportunities of Building Fast GBDT Systems

Gradient boosting learning for fraudulent publisher detection in online advertising

Verifying Tree Ensembles by Reasoning about Potential Instances

Contact Info

Product

Resources

About