Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2022
DOI: 10.1145/3534678.3539034
|View full text |Cite
|
Sign up to set email alerts
|

AutoShard: Automated Embedding Table Sharding for Recommender Systems

Abstract: Embedding learning is an important technique in deep recommendation models to map categorical features to dense vectors. However, the embedding tables often demand an extremely large number of parameters, which become the storage and efficiency bottlenecks. Distributed training solutions have been adopted to partition the embedding tables into multiple devices. However, the embedding tables can easily lead to imbalances if not carefully partitioned. This is a significant design challenge of distributed systems… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
7
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 14 publications
(7 citation statements)
references
References 67 publications
(173 reference statements)
0
7
0
Order By: Relevance
“…ScratchPipe (Kwon & Rhu, 2022) and RecShard (Sethi et al, 2022) tackle the problem of embedding access latency in hybrid CPU-GPU training systems ScratchPipe addresses the problem through the use of a run-ahead GPUside cache to attempt to have all embedding accesses hit in local GPU HBM, while RecShard uses a mixed-integer linear program and the per embedding table distributions to statically place the most frequently accessed rows in GPU HBM. AutoShard (Zha et al, 2022) focuses on the sharding of embedding tables in a multi-GPU only training system, and uses deep reinforcement-learning and a neural network based cost model to perform its placement decisions.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…ScratchPipe (Kwon & Rhu, 2022) and RecShard (Sethi et al, 2022) tackle the problem of embedding access latency in hybrid CPU-GPU training systems ScratchPipe addresses the problem through the use of a run-ahead GPUside cache to attempt to have all embedding accesses hit in local GPU HBM, while RecShard uses a mixed-integer linear program and the per embedding table distributions to statically place the most frequently accessed rows in GPU HBM. AutoShard (Zha et al, 2022) focuses on the sharding of embedding tables in a multi-GPU only training system, and uses deep reinforcement-learning and a neural network based cost model to perform its placement decisions.…”
Section: Related Workmentioning
confidence: 99%
“…Furthermore, while the sharding problem has been increasingly explored in recent works due to its importance (Adnan et al, 2021;Lui et al, 2021;Sethi et al, 2022;Zha et al, 2022), they all, to our knowledge, assume that the embedding tables to be sharded are either one-hot-meaning at most one embedding row per table will be accessed per training sample-or sum-pooled-meaning all embedding row accessed within a table by a training sample will be aggregated via summation before proceeding through the model.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…Reinforcement Learning. Reinforcement learning has shown strong performance in many rewarddriven tasks [38,69,45,34,46,47,19,64,67,23,65,63,61,58,59,62,60,26,10]. It has also been applied to AutoML search [71].…”
Section: Related Workmentioning
confidence: 99%