Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2022
DOI: 10.1145/3534678.3539070
|View full text |Cite
|
Sign up to set email alerts
|

Persia: An Open, Hybrid System Scaling Deep Learning-based Recommenders up to 100 Trillion Parameters

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 18 publications
(4 citation statements)
references
References 16 publications
0
4
0
Order By: Relevance
“…Other works directly utilize native key-value hash table to allow dynamic growth of table size [12,15,20,21]. These implementations builds upon TensorFlow but relies either on specially designed software mechanism [14,15,20] or hardware [21] to access and manage their hash-tables. Compared to these solutions, Monolith's hash-table is yet another native TensorFlow operation.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Other works directly utilize native key-value hash table to allow dynamic growth of table size [12,15,20,21]. These implementations builds upon TensorFlow but relies either on specially designed software mechanism [14,15,20] or hardware [21] to access and manage their hash-tables. Compared to these solutions, Monolith's hash-table is yet another native TensorFlow operation.…”
Section: Related Workmentioning
confidence: 99%
“…To support online update and avoid memory issues, both [12] and [20] designed feature eviction mechanisms to flexibly adjust the size of embedding tables. Both [12] and [14] support some form of online training, where learned parameters are synced to serving with a relatively short interval compared to traditional batch training, with fault tolerance mechanisms. Monolith took similar approach to elastically admit and evict features, while it has a more lightweight parameter synchronization mechanism to guarantee model quality.…”
Section: Related Workmentioning
confidence: 99%
“…While convolution operation can help improve traditional networks by utilizing strategies including sparse interactions, parameter sharing and equivariant representations, it also poses the problem of high computational overheads and time consumption during the training [9,10]. Furthermore, the number of parameters (million level) and calculations (billion level) grows exponentially with deepening of the network structure, leading to these problems becoming more serious, which aggravates the demand for high-performance training environments [11]. For this purpose, world-renowned hardware companies have developed processing units specialized in training large neural networks, such as Huawei's neural network processing unit, Google's tensor processing unit and ATI's video processing unit, and they have gained significant improvements in large-scale parallel computing compared with generalpurpose processors [12].…”
Section: Introductionmentioning
confidence: 99%
“…Many recent advances in deep learning have been attributed to significant increases in model size to hundreds of billions of parameters and training on ever-growing datasets [5,31,32,45]. Recent studies suggest that a trillion-parameter model would require at least 2TB of memory simply to store model parameters, and tens or hundreds of TB for training [18,24,37,38,42]. Naturally, scaling large model training has received intense attention over the past few years [3,11,29,45,53].…”
Section: Introductionmentioning
confidence: 99%