2011
DOI: 10.1007/978-3-642-23400-2_33
|View full text |Cite
|
Sign up to set email alerts
|

Distributed Scalable Collaborative Filtering Algorithm

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
14
0

Year Published

2012
2012
2018
2018

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 11 publications
(14 citation statements)
references
References 9 publications
0
14
0
Order By: Relevance
“…We achieved a training time (using I-divergence and C6, Section III) of around 9.38s with the full Netflix dataset and prediction time of 2.8s on 1.4M ratings with RMSE (Root Mean Square Error) of 0.87 ± 0.02. This is around 4× better than the best prior distributed algorithm [19] for the same dataset. To the best of our knowledge, this is the highest known parallel performance at such high accuracy.…”
Section: Introductionmentioning
confidence: 61%
See 2 more Smart Citations
“…We achieved a training time (using I-divergence and C6, Section III) of around 9.38s with the full Netflix dataset and prediction time of 2.8s on 1.4M ratings with RMSE (Root Mean Square Error) of 0.87 ± 0.02. This is around 4× better than the best prior distributed algorithm [19] for the same dataset. To the best of our knowledge, this is the highest known parallel performance at such high accuracy.…”
Section: Introductionmentioning
confidence: 61%
“…Using multi-core clusters, we deliver around two order of magnitude improvement in training time compared to the sequential concept decomposition technique [1] and around one of magnitude improvement compared to the parallel concept decomposition technique [18]. Narang et al [19] presents a flat distributed co-clustering algorithm where all the processors in the system participate in one iteration of the co-clustering algorithm, and both OpenMP and MPI (hybrid approach) are used to exploit both intra-node and inter-node parallelism available in Blue Gene/P. Using the Netflix dataset (100M ratings), it demonstrates the performance and scalability of the algorithm on 1024-node Blue Gene/P system: with training time of around 6s on the full Netflix dataset.In this paper, we present a novel hierarchical approach for distributed co-clustering along with load balancing optimizations leading to around 2× improvement in performance as compared to [19].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Cosine similarity without considering the user rating scale, as in the case of [1][2][3][4][5] score, a score of more than 3 of users is their love,and for the user B, score above 4 is your love. By subtracting the average score of the user pairs, the modified cosine similarity measure improves the above problems.…”
Section: Algorithm Calculationmentioning
confidence: 99%
“…When the era of rapid development of Web2.0,with the changing needs of users and the amount of data continues to expand,and the existing technical conditions,the collaborative filtering algorithm has some disadvantages [2],such as single processing efficiency is very low,a waste of computer resources,single processing mode has been unable to meet the massive data,processing speed and resource utilization are severely restricted.And the parallelism of the existing technology and processing platform is relatively low,scalability can not meet the needs of the actual business.The data sparsity and algorithm scalability lead to low accuracy,and the overall performance of the system is getting lower and lower with the increasing number of users and items.To solve the above problems,this paper studies the core idea of collaborative filtering,and the Hadoop and spark distributed computing architecture technology to the successful introduction of personalized recommendation system,and proposes a hybrid recommendation algorithm and recommendation algorithm based on hybrid spark,and passes through the strict test and repeated comparison,to a certain extent overcome recommendation the accuracy is not high,low scalability problems,and based on the parallel computing of the spark memory technology [3],improves the acceleration effect of the algorithm,greatly reduces the running time of the system.…”
Section: Introductionmentioning
confidence: 99%