2021
DOI: 10.1145/3432185
|View full text |Cite
|
Sign up to set email alerts
|

Planc

Abstract: We consider the problem of low-rank approximation of massive dense nonnegative tensor data, for example, to discover latent patterns in video and imaging applications. As the size of data sets grows, single workstations are hitting bottlenecks in both computation time and available memory. We propose a distributed-memory parallel computing solution to handle massive data sets, loading the input data across the memories of multiple nodes, and performing efficient and scalable parallel algorithms to compute the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 9 publications
(2 citation statements)
references
References 46 publications
0
2
0
Order By: Relevance
“…Almost all distributed GPU implementations including NMF-mGPU [28] and PLANC [33] rely on significant data communication for the update of the factors. This involves using CUDA-aware MPI primitives for data communication or MPI distributed memory offload through NVBLAS [33] without multi-node GPU communicators. Such implementation leads to high data movement costs due to data on-loading/offloading to/from the device, which significantly raises communication costs compared to the computation cost for large data decomposition.…”
Section: Related Work On Distributed Nmfmentioning
confidence: 99%
See 1 more Smart Citation
“…Almost all distributed GPU implementations including NMF-mGPU [28] and PLANC [33] rely on significant data communication for the update of the factors. This involves using CUDA-aware MPI primitives for data communication or MPI distributed memory offload through NVBLAS [33] without multi-node GPU communicators. Such implementation leads to high data movement costs due to data on-loading/offloading to/from the device, which significantly raises communication costs compared to the computation cost for large data decomposition.…”
Section: Related Work On Distributed Nmfmentioning
confidence: 99%
“…Almost all distributed GPU implementations including NMF-mGPU [28] and PLANC [33] rely on significant data communication for the update of the factors. This involves using CUDA-aware MPI primitives for data communication or MPI distributed memory offload through NVBLAS [33] without multi-node GPU communicators.…”
Section: Related Work On Distributed Nmfmentioning
confidence: 99%