Proceedings of the 47th International Conference on Parallel Processing 2018
DOI: 10.1145/3225058.3225096
|View full text |Cite
|
Sign up to set email alerts
|

Matrix Factorization on GPUs with Memory Optimization and Approximate Computing

Abstract: Matrix factorization (MF) discovers latent features from observations, which has shown great promises in the fields of collaborative filtering, data compression, feature extraction, word embedding, etc. While many problem-specific optimization techniques have been proposed, alternating least square (ALS) remains popular due to its general applicability (e.g. easy to handle positive-unlabeled inputs), fast convergence and parallelization capability. Current MF implementations are either optimized for a single m… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(1 citation statement)
references
References 25 publications
0
1
0
Order By: Relevance
“…To take advantage of hardware acceleration, [17] propose cuMF, a single-machine, memory-optimized multi-GPU implementation that scales to relatively large problems (up to 10 11 model parameters), further extended in [18] to allow approximate computation via a conjugate gradient solver. [17,18] exploit GPU memory hierarchy and model parallelism across GPUs in order to produce a highly performant implementation of ALS. As cuMF exploits unique properties of the GPU hardware, ALX overcomes various challenges and exploits unique properties of TPUs.…”
Section: Related Workmentioning
confidence: 99%
“…To take advantage of hardware acceleration, [17] propose cuMF, a single-machine, memory-optimized multi-GPU implementation that scales to relatively large problems (up to 10 11 model parameters), further extended in [18] to allow approximate computation via a conjugate gradient solver. [17,18] exploit GPU memory hierarchy and model parallelism across GPUs in order to produce a highly performant implementation of ALS. As cuMF exploits unique properties of the GPU hardware, ALX overcomes various challenges and exploits unique properties of TPUs.…”
Section: Related Workmentioning
confidence: 99%