2022
DOI: 10.48550/arxiv.2204.04903
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

PICASSO: Unleashing the Potential of GPU-centric Training for Wide-and-deep Recommender Systems

Abstract: The development of personalized recommendation has significantly improved the accuracy of information matching and the revenue of e-commerce platforms. Recently, it has two trends: 1) recommender systems must be trained timely to cope with ever-growing new products and ever-changing user interests from online marketing and social network; 2) state-of-the-art recommendation models introduce deep neural network (DNN) modules to improve prediction accuracy. Traditional CPU-based recommender systems cannot meet th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 21 publications
(31 reference statements)
0
2
0
Order By: Relevance
“…Parameter Server [9,20] is one of the most popular architectures for the distributed DLRM training and DMAML [5] customizes the Parameter Server architecture for MAML training in the CPU cluster. However, two update loops in meta learning double the computing and the computation-intensive dense layer becomes more complicated in DLRM [36,38], which makes the CPU time-consuming to compute and requires GPU for acceleration. Nevertheless, Parameter Server is mainly used in the CPU cluster and the design underutilizes the capability of GPU since the embedding layers held in servers are I/O and communicationintensive operators [30].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Parameter Server [9,20] is one of the most popular architectures for the distributed DLRM training and DMAML [5] customizes the Parameter Server architecture for MAML training in the CPU cluster. However, two update loops in meta learning double the computing and the computation-intensive dense layer becomes more complicated in DLRM [36,38], which makes the CPU time-consuming to compute and requires GPU for acceleration. Nevertheless, Parameter Server is mainly used in the CPU cluster and the design underutilizes the capability of GPU since the embedding layers held in servers are I/O and communicationintensive operators [30].…”
Section: Introductionmentioning
confidence: 99%
“…Secondly, meta learning requires different data management against the traditional deep learning training, and conventional I/O design bottlenecks the training speed [2,24]. Meta learning requires assembling the batch data according to both the task level and batch level when traditional deep learning only requires batch level in the training pipeline [1,36,38]. To be more specific, each worker may hold the batch data from different tasks, but the samples in a batch should belong to the identical task after shuffling for correctness.…”
Section: Introductionmentioning
confidence: 99%