Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

Yu, Hsiang‐Fu; Hsieh, Cho‐Jui; Si, Si; Dhillon, Inderjit S.

doi:10.1109/icdm.2012.168

Cited by 197 publications

(159 citation statements)

References 14 publications

Supporting

Mentioning

156

Contrasting

Unclassified

Order By: Relevance

“…The early use of CD for MF was in [7], but here we consider the e cient implementation in [29]. The idea is to update one column of W and H at a time.…”

Section: Coordinate Descent (Cd)mentioning

confidence: 99%

“…Thus, the construction of (3.8) costs O(|⌦|k) operations. In [29], the time complexity can be reduced to O(|⌦|) by maintaining the following residual…”

Section: Implementation Detailsmentioning

confidence: 99%

“…Besides, their procedure is more complicated for needing the eigen-decomposition of a k by k matrix. The recent work on PU (positive-unlabeled) learning [10] has mentioned that the CD framework in [29] can be modified to have the complexity (3.15), but detailed derivations are not given.…”

Section: Implementation Detailsmentioning

confidence: 99%

See 2 more Smart Citations

Selection of Negative Samples for One-class Matrix Factorization

Yu¹,

Bilenko²

2017

Proceedings of the 2017 SIAM International Conference on Data Mining

Self Cite

View full text Add to dashboard Cite

Many recommender systems have only implicit user feedback. The two possible ratings are positive and negative, but only part of positive entries are observed. One-class matrix factorization (MF) is a popular approach for such scenarios by treating some missing entries as negative. Two major ways to select negative entries are by sub-sampling a set with similar size to that of observed positive entries or by including all missing entries as negative. They are referred to as "subsampled" and "full" approaches in this work, respectively. Currently detailed comparisons between these two selection schemes on large-scale data are still lacking. One important reason is that the "full" approach leads to a hard optimization problem after treating all missing entries as negative. In this paper, we successfully develop e cient optimization techniques to solve this challenging problem so that the "full" approach becomes practically viable. We then compare in detail the two approaches "subsampled" and "full" for selecting negative entries. Results show that the "full" approach of including much more missing entries as negative yields better results.

show abstract

“…The early use of CD for MF was in [7], but here we consider the e cient implementation in [29]. The idea is to update one column of W and H at a time.…”

Section: Coordinate Descent (Cd)mentioning

confidence: 99%

“…Thus, the construction of (3.8) costs O(|⌦|k) operations. In [29], the time complexity can be reduced to O(|⌦|) by maintaining the following residual…”

Section: Implementation Detailsmentioning

confidence: 99%

See 1 more Smart Citation

Selection of Negative Samples for One-class Matrix Factorization

Yu¹,

Bilenko²

2017

Proceedings of the 2017 SIAM International Conference on Data Mining

Self Cite

View full text Add to dashboard Cite

show abstract

“…Commonly used methods for (3.2) include stochastic gradient descent (SDG) [10,18], alternative least squares (ALS) [19,11] and coordinate descent (CD) [21]. A variant of (3.2) without the quadratic regularization is solved by a weighted alternating method LMaFit [20].…”

Section: L2 Modelsmentioning

confidence: 99%

“…Efficient computational methods for (1.1) include stochastic gradient descent, alternating least squares, and coordinate descent methods [10,18,21]. We notice that the error term of model (1.1) is L2 squared.…”

Section: Introductionmentioning

confidence: 99%

A Coordinate Descent Method for Robust Matrix Factorization and Applications

Sheen¹

2016

SIURO

View full text Add to dashboard Cite

Matrix factorization methods are widely used for extracting latent factors for low rank matrix completion and rating prediction problems arising in recommender systems of on-line retailers. Most of the existing models are based on L2 fidelity (quadratic functions of factorization error). In this work, a coordinate descent (CD) method is developed for matrix factorization under L1 fidelity so that the related minimization is done one variable at a time and the factorization error is sparsely distributed. In low rank random matrix completion and rating prediction of MovieLens-100k datasets, the CDL1 method shows remarkable stability and accuracy under gross corruption of training (observation) data while the L2 fidelity based methods rapidly deteriorate. A closed form analytical solution is found for the one-dimensional L1-fidelity subproblem, and is used as a building block of CDL1 algorithm whose convergence is analyzed. The connection with the well-known convex method, the robust principal component analysis (RPCA), is made. A comparison with RPCA on recovering low rank Gaussian matrices under sparse and independent Gaussian noise shows that CDL1 maintains accuracy at much lower sampling ratios (from much fewer observed entries) than that for RPCA.

show abstract

GPUSGD: A GPU‐accelerated stochastic gradient descent algorithm for matrix factorization

Jin

Lai

et al. 2015

Concurrency and Computation

View full text Add to dashboard Cite

SUMMARYMatrix factorization is one of the leading techniques for many applications such as social network-based recommendation systems. As of today, many parallel stochastic gradient descent (SGD) methods have been proposed to address the matrix factorization issue on shared-memory (multi-core) systems and distributed systems. However, these methods cannot be improved significantly on graphics processing unit (GPU) because the serious over-writing problem and thread divergence may occur. The fundamental reason for such undesired results is that GPU is a parallel single instruction multiple data device, which only can greatly improve the applications with fine-grained parallelism. In this paper, we propose an efficient GPU algorithm, named GPUSGD, to solve the matrix factorization problem based on SGD method. The major advantage of the proposed GPUSGD is that such method not only can handle the over-writing problem but also can avoid the performance loss caused by the thread divergence. The experimental results show that GPUSGD performs much better in accelerating the matrix factorization compared with the existing state-of-the-art parallel methods. To the best of our knowledge, this is the first work that develops a parallel SGD method to improve the matrix factorization on GPU.

show abstract

Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

Cited by 197 publications

References 14 publications

Selection of Negative Samples for One-class Matrix Factorization

Selection of Negative Samples for One-class Matrix Factorization

A Coordinate Descent Method for Robust Matrix Factorization and Applications

GPUSGD: A GPU‐accelerated stochastic gradient descent algorithm for matrix factorization

Contact Info

Product

Resources

About