2019
DOI: 10.14778/3342263.3342273
|View full text |Cite
|
Sign up to set email alerts
|

An experimental evaluation of large scale GBDT systems

Abstract: Gradient boosting decision tree (GBDT) is a widely-used machine learning algorithm in both data analytic competitions and real-world industrial applications. Further, driven by the rapid increase in data volume, efforts have been made to train GBDT in a distributed setting to support large-scale workloads. However, we find it surprising that the existing systems manage the training dataset in different ways, but none of them have studied the impact of data management. To that end, this paper aims to study the … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
3
2

Relationship

2
6

Authors

Journals

citations
Cited by 24 publications
(9 citation statements)
references
References 45 publications
0
9
0
Order By: Relevance
“…We have also used data parallelism to implement LambdaML. Other research topics in distributed ML include compression [6,7,52,53,93,96,97,101], decentralization [28,41,59,65,90,91,100], synchronization [4,19,26,46,66,68,87,94,102], straggler [8,56,83,89,98,105], data partition [1,3,36,55,77], etc.…”
Section: Related Workmentioning
confidence: 99%
“…We have also used data parallelism to implement LambdaML. Other research topics in distributed ML include compression [6,7,52,53,93,96,97,101], decentralization [28,41,59,65,90,91,100], synchronization [4,19,26,46,66,68,87,94,102], straggler [8,56,83,89,98,105], data partition [1,3,36,55,77], etc.…”
Section: Related Workmentioning
confidence: 99%
“…Thun-derGBM also uses the centralized design for a machine with multiple GPUs, but does not support distributed computing. [Fu et al, 2019]. Feature based partitioning does not require the whole histogram to be sent to other machines.…”
Section: Network Communicationmentioning
confidence: 99%
“…Empirically, 20 buckets are used in popular GBDT frameworks [13,18] The split finding algorithm is depicted in Algorithm 1.…”
Section: Gradient Boosting Decision Tree (Gbdt)mentioning
confidence: 99%
“…• Its efficiency should be close to traditional distributed ML [8,13,18,20], i.e., the number of cryptographic operations should be minimized.…”
Section: Problem Statementmentioning
confidence: 99%
See 1 more Smart Citation