Understanding Capacity-Driven Scale-Out Neural Recommendation Inference

Lui, Michael S.; Yetim, Yavuz; Özkan, Özgür; Zhao, Zhuoran; Tsai, Shin-Yeh; Wu, Carole-Jean; Hempstead, Mark

doi:10.48550/arxiv.2011.02084

Cited by 5 publications

(10 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Nonetheless, the multi-stage GPU-CPU design plays an important role. Recent work shows production-scale recommendation model sizes are growing rapidly-by an order of magnitude in just three years [39]. For production-scale models that are larger than the DRAM capacity available on GPUs (e.g., ∼ 15GB on NVIDIA T4), designers will need to decompose models into multiple stages.…”

Section: Mapping Multi-stage Pipelines To Heterogeneous Systemsmentioning

confidence: 99%

“…This is due to the limits of multi-tenant execution, under utilized hardware when separately exploiting data-and model-level parallelism across stages, and high PCIe data communication between stages. Given these limitations and the growing scale of personalized recommendation across Internet services [39,59,60], we use RecPipe to unlock the opportunities from multi-stage ranking by designing specialized hardware to provide high quality and infrastructure efficiency, in the following section.…”

Section: Mapping Multi-stage Pipelines To Heterogeneous Systemsmentioning

confidence: 99%

“…So far we have analyzed the performance of RPAccel on open-source use-cases. However, recent literature shows production-scale recommendation models are rapidly growing in size, outpacing DRAM capacity and even reaching TBs in size [39]. One promising path to enabling future, production-scale models is to use higher capacity memories such as SSDs [11,49].…”

Section: Rpaccel Evaluation On Future Modelsmentioning

confidence: 99%

“…Deep neural network (DNN) based recommendation systems constitute an overwhelming fraction of AI cycles in production data centers (e.g., Facebook, Google, Alibaba) [1,18,21,22,30,51,58,59,60]. To improve content personalization in a wide range of services (e.g., search, e-commerce, movie and video-streaming, social media), the size of production recommendation models has grown by over 10× between 2017 and 2020 [39,56,57].…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

RecPipe: Co-designing Models and Hardware to Jointly Optimize Recommendation Quality and Performance

Gupta¹,

Hsia²,

Zhang³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Deep learning recommendation systems must provide high quality, personalized content under strict tail-latency targets and high system loads. This paper presents RecPipe, a system to jointly optimize recommendation quality and inference performance. Central to RecPipe is decomposing recommendation models into multi-stage pipelines to maintain quality while reducing compute complexity and exposing distinct parallelism opportunities. RecPipe implements an inference scheduler to map multi-stage recommendation engines onto commodity, heterogeneous platforms (e.g., CPUs, GPUs). While the hardware-aware scheduling improves ranking efficiency, the commodity platforms suffer from many limitations requiring specialized hardware. Thus, we design RecPipeAccel (RPAccel), a custom accelerator that jointly optimizes quality, tail-latency, and system throughput. RPAccel is designed specifically to exploit the distinct design space opened via RecPipe. In particular, RPAccel processes queries in sub-batches to pipeline recommendation stages, implements dual static and dynamic embedding caches, a set of top-k filtering units, and a reconfigurable systolic array. Compared to prior-art and at iso-quality, we demonstrate that RPAccel improves latency and throughput by 3× and 6×.

show abstract

Section: Mapping Multi-stage Pipelines To Heterogeneous Systemsmentioning

confidence: 99%

Section: Mapping Multi-stage Pipelines To Heterogeneous Systemsmentioning

confidence: 99%

Section: Rpaccel Evaluation On Future Modelsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

RecPipe: Co-designing Models and Hardware to Jointly Optimize Recommendation Quality and Performance

Gupta¹,

Hsia²,

Zhang³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…First, they enable important components and services across a wide breadth of domains, seeing widespread adoption at Facebook [8,[19][20][21]34], Google [12,15,23], Microsoft [18], Baidu [50], and many other hyperscale companies [41,51]. Secondly, training these models, which often consist of trillions of parameters [32,37], places enormous demands on the end-to-end training and data ingestion pipeline. Training a production recommendation system takes weeks, requiring numerous training jobs each using hundreds of distributed GPUs.…”

Section: Introductionmentioning

confidence: 99%

Understanding Data Storage and Ingestion for Large-Scale Deep Recommendation Model Training

Zhao,

Agarwal,

Basant

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

RecSSD: near data processing for solid state drive based recommendation inference

Wilkening

Gupta

Hsia

et al. 2021

Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

Self Cite

View full text Add to dashboard Cite

Neural personalized recommendation models are used across a wide variety of datacenter applications including search, social media, and entertainment. State-of-the-art models comprise large embedding tables that have billions of parameters requiring large memory capacities. Unfortunately, large and fast DRAM-based memories levy high infrastructure costs. Conventional SSD-based storage solutions offer an order of magnitude larger capacity, but have worse read latency and bandwidth, degrading inference performance. RecSSD is a near data processing based SSD memory system customized for neural recommendation inference that reduces end-to-end model inference latency by 2× compared to using COTS SSDs across eight industry-representative models. CCS CONCEPTS• Hardware → External storage; • Computer systems organization → Neural networks.

show abstract

Understanding Capacity-Driven Scale-Out Neural Recommendation Inference

Cited by 5 publications

References 0 publications

RecPipe: Co-designing Models and Hardware to Jointly Optimize Recommendation Quality and Performance

RecPipe: Co-designing Models and Hardware to Jointly Optimize Recommendation Quality and Performance

Understanding Data Storage and Ingestion for Large-Scale Deep Recommendation Model Training

RecSSD: near data processing for solid state drive based recommendation inference

Contact Info

Product

Resources

About