GPU-accelerated Large-Scale Non-negative Matrix Factorization Using Spark

Tang, Bing; Kang, Linyao; Xia, Yanmin; Zhang, Li

doi:10.1007/978-3-030-12981-1_13

Cited by 3 publications

(2 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Sun et al realized large-scale NMF based on MapReduce in [32], and Liu et al also proposed a distributed NMF based on MapReduce for processing largescale web data using Hadoop streaming method [19]. In our previous work [33], we proposed a parallel NMF algorithm in Spark platform, which makes full use of the advantages of in-memory computation mode.…”

Section: Scientific Programmingmentioning

confidence: 99%

Collaborative Filtering Recommendation Using Nonnegative Matrix Factorization in GPU-Accelerated Spark Platform

Tang

Kang

Zhang

et al. 2021

Scientific Programming

Self Cite

View full text Add to dashboard Cite

Nonnegative matrix factorization (NMF) has been introduced as an efficient way to reduce the complexity of data compression and its capability of extracting highly interpretable parts from data sets, and it has also been applied to various fields, such as recommendations, image analysis, and text clustering. However, as the size of the matrix increases, the processing speed of nonnegative matrix factorization is very slow. To solve this problem, this paper proposes a parallel algorithm based on GPU for NMF in Spark platform, which makes full use of the advantages of in-memory computation mode and GPU acceleration. The new GPU-accelerated NMF on Spark platform is evaluated in a 4-node Spark heterogeneous cluster using Google Compute Engine by configuring each node a NVIDIA K80 CUDA device, and experimental results indicate that it is competitive in terms of computational time against the existing solutions on a variety of matrix orders. Furthermore, a GPU-accelerated NMF-based parallel collaborative filtering (CF) algorithm is also proposed, utilizing the advantages of data dimensionality reduction and feature extraction of NMF, as well as the multicore parallel computing mode of CUDA. Using real MovieLens data sets, experimental results have shown that the parallelization of NMF-based collaborative filtering on Spark platform effectively outperforms traditional user-based and item-based CF with a higher processing speed and higher recommendation accuracy.

show abstract

Section: Scientific Programmingmentioning

confidence: 99%

Collaborative Filtering Recommendation Using Nonnegative Matrix Factorization in GPU-Accelerated Spark Platform

Tang

Kang

Zhang

et al. 2021

Scientific Programming

Self Cite

View full text Add to dashboard Cite

show abstract

“…Recently, there has been growing interest in scaling tensor operations to bigger data and more processors in both the data mining/machine learning and the high performance computing communities. For sparse tensors, there have been parallelization efforts to compute CP decompositions on shared-memory platforms [34,51], distributed-memory platforms [24,26,38,50] and GPUs [40,41,52], and these approaches can be generalized to constrained problems [49].…”

Section: Related Workmentioning

confidence: 99%

PLANC: Parallel Low Rank Approximation with Non-negativity Constraints

Eswar,

Hayashi,

Ballard

et al. 2019

Preprint

View full text Add to dashboard Cite

We consider the problem of low-rank approximation of massive dense non-negative tensor data, for example to discover latent patterns in video and imaging applications. As the size of data sets grows, single workstations are hitting bottlenecks in both computation time and available memory. We propose a distributed-memory parallel computing solution to handle massive data sets, loading the input data across the memories of multiple nodes and performing efficient and scalable parallel algorithms to compute the low-rank approximation. We present a software package called PLANC (Parallel Low Rank Approximation with Non-negativity Constraints), which implements our solution and allows for extension in terms of data (dense or sparse, matrices or tensors of any order), algorithm (e.g., from multiplicative updating techniques to alternating direction method of multipliers), and architecture (we exploit GPUs to accelerate the computation in this work). We describe our parallel distributions and algorithms, which are careful to avoid unnecessary communication and computation, show how to extend the software to include new algorithms and/or constraints, and report efficiency and scalability results for both synthetic and real-world data sets.

show abstract