2020
DOI: 10.48550/arxiv.2011.02084
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Understanding Capacity-Driven Scale-Out Neural Recommendation Inference

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
10
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5

Relationship

4
1

Authors

Journals

citations
Cited by 5 publications
(10 citation statements)
references
References 0 publications
0
10
0
Order By: Relevance
“…Nonetheless, the multi-stage GPU-CPU design plays an important role. Recent work shows production-scale recommendation model sizes are growing rapidly-by an order of magnitude in just three years [39]. For production-scale models that are larger than the DRAM capacity available on GPUs (e.g., ∼ 15GB on NVIDIA T4), designers will need to decompose models into multiple stages.…”
Section: Mapping Multi-stage Pipelines To Heterogeneous Systemsmentioning
confidence: 99%
See 3 more Smart Citations
“…Nonetheless, the multi-stage GPU-CPU design plays an important role. Recent work shows production-scale recommendation model sizes are growing rapidly-by an order of magnitude in just three years [39]. For production-scale models that are larger than the DRAM capacity available on GPUs (e.g., ∼ 15GB on NVIDIA T4), designers will need to decompose models into multiple stages.…”
Section: Mapping Multi-stage Pipelines To Heterogeneous Systemsmentioning
confidence: 99%
“…This is due to the limits of multi-tenant execution, under utilized hardware when separately exploiting data-and model-level parallelism across stages, and high PCIe data communication between stages. Given these limitations and the growing scale of personalized recommendation across Internet services [39,59,60], we use RecPipe to unlock the opportunities from multi-stage ranking by designing specialized hardware to provide high quality and infrastructure efficiency, in the following section.…”
Section: Mapping Multi-stage Pipelines To Heterogeneous Systemsmentioning
confidence: 99%
See 2 more Smart Citations
“…First, they enable important components and services across a wide breadth of domains, seeing widespread adoption at Facebook [8,[19][20][21]34], Google [12,15,23], Microsoft [18], Baidu [50], and many other hyperscale companies [41,51]. Secondly, training these models, which often consist of trillions of parameters [32,37], places enormous demands on the end-to-end training and data ingestion pipeline. Training a production recommendation system takes weeks, requiring numerous training jobs each using hundreds of distributed GPUs.…”
Section: Introductionmentioning
confidence: 99%