Distributed Scalable Collaborative Filtering Algorithm

Narang, Ankur; Srivastava, Abhinav; Katta, Naga Praveen Kumar

doi:10.1007/978-3-642-23400-2_33

Cited by 11 publications

(14 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We achieved a training time (using I-divergence and C6, Section III) of around 9.38s with the full Netflix dataset and prediction time of 2.8s on 1.4M ratings with RMSE (Root Mean Square Error) of 0.87 ± 0.02. This is around 4× better than the best prior distributed algorithm [19] for the same dataset. To the best of our knowledge, this is the highest known parallel performance at such high accuracy.…”

Section: Introductionmentioning

confidence: 61%

“…Using multi-core clusters, we deliver around two order of magnitude improvement in training time compared to the sequential concept decomposition technique [1] and around one of magnitude improvement compared to the parallel concept decomposition technique [18]. Narang et al [19] presents a flat distributed co-clustering algorithm where all the processors in the system participate in one iteration of the co-clustering algorithm, and both OpenMP and MPI (hybrid approach) are used to exploit both intra-node and inter-node parallelism available in Blue Gene/P. Using the Netflix dataset (100M ratings), it demonstrates the performance and scalability of the algorithm on 1024-node Blue Gene/P system: with training time of around 6s on the full Netflix dataset.In this paper, we present a novel hierarchical approach for distributed co-clustering along with load balancing optimizations leading to around 2× improvement in performance as compared to [19].…”

Section: Related Workmentioning

confidence: 99%

“…Further, a novel load balancing approach has been formulated for the hierarchical algorithm. • Analytical parallel time complexity analysis, establishes theoretically that our hierarchical design leads to performance gain of order O(log(π) (where π is number of row and column partitions of the input matrix) over the best prior approach [19]. • We demonstrate soft real-time parallel CF on the Netflix Prize and Yahoo KDD Cup datasets using a 4096-node multi-core cluster architecture (Blue Gene/P 3 ).…”

Section: Introductionmentioning

confidence: 98%

See 2 more Smart Citations

Distributed hierarchical co-clustering and collaborative filtering algorithm

Narang

Srivastava

Katta

2012

2012 19th International Conference on High Performance Computing

Self Cite

View full text Add to dashboard Cite

Petascale Analytics is a hot research area both in academia and industry. It envisages processing massive amounts of data at extremely high rates to generate new scientific insights along with positive impact (for both users and providers) of industries such as E-commerce, Telecom, Finance, Life Sciences and so forth. We consider collaborative filtering (CF) and Clustering algorithms that are key fundamental analytics kernels that help in achieving these aims. Real-time CF and co-clustering on highly sparse massive datasets, while achieving a high prediction accuracy, is a computationally challenging problem. In this paper, we present a novel hierarchical design for soft real-time (less than 1 minute.) distributed co-clustering based collaborative filtering algorithm. Our distributed algorithm has been optimized for multi-core cluster architectures. Theoretical analysis of the time complexity of our algorithm proves the efficacy of our approach. Using the Netflix dataset (900M training ratings with replication) as well as the Yahoo KDD Cup 1 (4.6B training ratings with replication) datasets , we demonstrate the performance and scalability of our algorithm on a 4096-node multi-core cluster architecture. Our distributed algorithm (implemented using OpenMP with MPI) demonstrates around 4× better performance (on Blue Gene/P) as compared to the best prior work, along with high accuracy (26 ± 4 RMSE for Yahoo KDD Cup data and 0.87 ± 0.02 for Netflix data). To the best of our knowledge, these are the best known performance results for collaborative filtering, at high prediction accuracy, for multi-core cluster architectures.

show abstract

Section: Introductionmentioning

confidence: 61%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 98%

See 1 more Smart Citation

Distributed hierarchical co-clustering and collaborative filtering algorithm

Narang

Srivastava

Katta

2012

2012 19th International Conference on High Performance Computing

Self Cite

View full text Add to dashboard Cite

show abstract

“…Cosine similarity without considering the user rating scale, as in the case of [1][2][3][4][5] score, a score of more than 3 of users is their love,and for the user B, score above 4 is your love. By subtracting the average score of the user pairs, the modified cosine similarity measure improves the above problems.…”

Section: Algorithm Calculationmentioning

confidence: 99%

“…When the era of rapid development of Web2.0,with the changing needs of users and the amount of data continues to expand,and the existing technical conditions,the collaborative filtering algorithm has some disadvantages [2],such as single processing efficiency is very low,a waste of computer resources,single processing mode has been unable to meet the massive data,processing speed and resource utilization are severely restricted.And the parallelism of the existing technology and processing platform is relatively low,scalability can not meet the needs of the actual business.The data sparsity and algorithm scalability lead to low accuracy,and the overall performance of the system is getting lower and lower with the increasing number of users and items.To solve the above problems,this paper studies the core idea of collaborative filtering,and the Hadoop and spark distributed computing architecture technology to the successful introduction of personalized recommendation system,and proposes a hybrid recommendation algorithm and recommendation algorithm based on hybrid spark,and passes through the strict test and repeated comparison,to a certain extent overcome recommendation the accuracy is not high,low scalability problems,and based on the parallel computing of the spark memory technology [3],improves the acceleration effect of the algorithm,greatly reduces the running time of the system.…”

Section: Introductionmentioning

confidence: 99%

Hybrid recommendation and parallelization of movies based on spark

Zhou¹,

Liu²

2018

Proceedings of the 2017 4th International Conference on Machinery, Materials and Computer (MACMC 2017)

View full text Add to dashboard Cite

Abstract:With the exponential growth of Internet data,the traditional stand-alone computational model has been unable to solve the real-time precise recommendation items in a complex and huge data,and the defect of traditional recommendation algorithm has become more obvious,this paper studies the collaborative filtering algorithm and matrix decomposition method,designs a parallel computing architecture based on spark,and a movie recommendation based on hybrid recommendation algorithm [1],the experimental results show that in a certain extent improves the recommendation accuracy and scalability,and has good acceleration effect.

show abstract

An Evolutionary Scheme for Improving Recommender System Using Clustering

Berbague

Karabadji

Séridi

2018

Computational Intelligence and Its Applications

View full text Add to dashboard Cite

In user memory based collaborative filtering algorithm, recommendation quality depends strongly on the neighbors selection which is a high computation complexity task in large scale datasets.A common approach to overpass this limitation consists of clustering users into groups of similar profiles and restrict neighbors computation to the cluster that includes the target user. K-means is a popular clustering algorithms used widely for recommendation but initial seeds selection is still a hard complex step. In this paper a new genetic algorithm encoding is proposed as an alternative of k-means clustering. The initialization issue in the classical k-means is targeted by proposing a new formulation of the problem, to reduce the search space complexity affect as well as improving clustering quality. We have evaluated our results using different quality measures. The employed metrics include rating prediction evaluation computed using mean absolute error. Additionally, we employed both of precision and recall measures using different parameters. The obtained results have been compared against baseline techniques which proved a significant enhancement.

show abstract

Distributed Scalable Collaborative Filtering Algorithm

Cited by 11 publications

References 9 publications

Distributed hierarchical co-clustering and collaborative filtering algorithm

Distributed hierarchical co-clustering and collaborative filtering algorithm

Hybrid recommendation and parallelization of movies based on spark

An Evolutionary Scheme for Improving Recommender System Using Clustering

Contact Info

Product

Resources

About