A fast parallel SGD for matrix factorization in shared memory systems

Zhuang, Yun; Chin, Wei-Sheng; Juan, Yu-Chin; Lin, Chih-Jen

doi:10.1145/2507157.2507164

Cited by 130 publications

(117 citation statements)

References 9 publications

Supporting

Mentioning

116

Contrasting

Unclassified

Order By: Relevance

“…FPSGD [15] introduces a conflict-free scheduling where workers can work asyncronously. In FPSGD, a scheduler handles block exchange among workers.…”

Section: Fpsgdmentioning

confidence: 99%

“…FPSGD [15] is a state-of-the-art parallel SGD method for matrix factorization in shared memory systems. In FPSGD, the rating matrix is divided into many blocks and threads work on blocks so as not to update the same rows or columns at the same time.…”

Section: Introductionmentioning

confidence: 99%

“…FPSGD by Zhuang et al [15] is an existing parallel SGD method for matrix factorization by dividing the rating matrix into many small blocks. Threads work on blocks, so that they do not update the same rows or columns of the factor matrices.…”

mentioning

confidence: 99%

See 2 more Smart Citations

Scalable Task-Parallel SGD on Matrix Factorization in Multicore Architectures

Nishioka

Taura

2015

2015 IEEE International Parallel and Distributed Processing Symposium Workshop

View full text Add to dashboard Cite

Recommendation is an indispensable technique especially in e-commerce services such as Amazon or Netflix to provide more preferable items to users. Matrix factorization is a well-known algorithm for recommendation which estimates affinities between users and items solely based on ratings explicitly given by users.To handle the large amounts of data, stochastic gradient descent (SGD), which is an online loss minimization algorithm, can be applied to matrix factorization. SGD is an effective method in terms of both convergence speed and memory consumption, but is difficult to be parallelized due to its essential sequentiality.FPSGD by Zhuang et al. [15] is an existing parallel SGD method for matrix factorization by dividing the rating matrix into many small blocks. Threads work on blocks, so that they do not update the same rows or columns of the factor matrices. Because of this technique FPSGD achieves higher convergence speed than other existing methods. Still, as we demonstrate in this paper, FPSGD does not scale beyond 32 cores with 1.4GB Netflix dataset because assigning non-conflicting blocks to threads needs a lock operation.In this work, we propose an alternative approach of SGD for matrix factorization using task parallel programming model. As a result, we have successfully overcome the bottleneck of FPSGD and achieved higher scalability with 64 cores.

show abstract

“…FPSGD [15] introduces a conflict-free scheduling where workers can work asyncronously. In FPSGD, a scheduler handles block exchange among workers.…”

Section: Fpsgdmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Scalable Task-Parallel SGD on Matrix Factorization in Multicore Architectures

Nishioka

Taura

2015

2015 IEEE International Parallel and Distributed Processing Symposium Workshop

View full text Add to dashboard Cite

show abstract

“…Currently, many successful recommendation algorithms [10,16,19,20] have been proposed. Most researchers still focus on introducing and improving more effective and efficient recommendation approaches, afterwards the whole data set is utilized for the recommendation performance evaluation [1,8].…”

Section: Introductionmentioning

confidence: 99%

“…Recently, the concept of user coherence [5] has been introduced and shown correlation with the magic barrier. As for the scalable problem, researchers try to speed up the computation by deploying some distributed computing frameworks [18] or by designing parallel strategies [20]. In fact, such methods cannot decrease the essential cost because the data quantity to be used is not changed.…”

Section: Introductionmentioning

confidence: 99%

A Novel Framework to Process the Quantity and Quality of User Behavior Data in Recommender Systems

Lin

Yao

2016

Web-Age Information Management

View full text Add to dashboard Cite

Abstract. Recommender system has become one of the most popular techniques to cope with the information overload problem. In the past years many algorithms have been proposed to obtain accurate recommendations. Such methods usually put all the collected user data into learning models without a careful consideration of the quantity and quality of individual user feedbacks. Yet in real applications, different types of users tend to represent preferences and opinions in various ways, thus resulting in user data with radically diverse quantity and quality. This characteristic of data influences the performance of recommendations. However, little attention has been devoted to the management of quantity and quality for user data in recommender systems. In this paper, we propose a generic framework to seamlessly exploit different pre-processing and recommendation approaches for ratings of different users. More specifically, we first classify users into groups based on the quantity and quality of their behavior data. In order to handle the user groups diversely, we further propose several data pre-processing strategies. Subsequently, we present a novel transfer latent factor model (TLMF) to transfer learnt models between groups. Finally, we conduct extensive experiments on a large data set and demonstrate the effectiveness of our proposed framework.

show abstract

GPUSGD: A GPU‐accelerated stochastic gradient descent algorithm for matrix factorization

Jin

Lai

et al. 2015

Concurrency and Computation

View full text Add to dashboard Cite

SUMMARYMatrix factorization is one of the leading techniques for many applications such as social network-based recommendation systems. As of today, many parallel stochastic gradient descent (SGD) methods have been proposed to address the matrix factorization issue on shared-memory (multi-core) systems and distributed systems. However, these methods cannot be improved significantly on graphics processing unit (GPU) because the serious over-writing problem and thread divergence may occur. The fundamental reason for such undesired results is that GPU is a parallel single instruction multiple data device, which only can greatly improve the applications with fine-grained parallelism. In this paper, we propose an efficient GPU algorithm, named GPUSGD, to solve the matrix factorization problem based on SGD method. The major advantage of the proposed GPUSGD is that such method not only can handle the over-writing problem but also can avoid the performance loss caused by the thread divergence. The experimental results show that GPUSGD performs much better in accelerating the matrix factorization compared with the existing state-of-the-art parallel methods. To the best of our knowledge, this is the first work that develops a parallel SGD method to improve the matrix factorization on GPU.

show abstract

A fast parallel SGD for matrix factorization in shared memory systems

Cited by 130 publications

References 9 publications

Scalable Task-Parallel SGD on Matrix Factorization in Multicore Architectures

Scalable Task-Parallel SGD on Matrix Factorization in Multicore Architectures

A Novel Framework to Process the Quantity and Quality of User Behavior Data in Recommender Systems

GPUSGD: A GPU‐accelerated stochastic gradient descent algorithm for matrix factorization

Contact Info

Product

Resources

About