Towards reproducibility in recommender-systems research

Beel, Joeran; Breitinger, Corinna; Langer, Stefan; Lommatzsch, Andreas; Gipp, Béla

doi:10.1007/s11257-016-9174-x

Cited by 61 publications

(41 citation statements)

References 81 publications

(103 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…• A working version of the source code is available or the code only has to be modified in minimal ways to work correctly. 3 • At least one dataset used in the original paper is available. A further requirement here is that either the originally-used train-test splits are publicly available or that they can be reconstructed based on the information in the paper.…”

Section: Research Methods 21 Collecting Reproducible Papersmentioning

confidence: 99%

See 1 more Smart Citation

Are we really making much progress? A worrying analysis of recent neural recommendation approaches

Dacrema

Cremonesi

Jannach

2019

Proceedings of the 13th ACM Conference on Recommender Systems

429

166

View full text Add to dashboard Cite

Deep learning techniques have become the method of choice for researchers working on algorithmic aspects of recommender systems. With the strongly increased interest in machine learning in general, it has, as a result, become difficult to keep track of what represents the state-of-the-art at the moment, e.g., for top-n recommendation tasks. At the same time, several recent publications point out problems in today's research practice in applied machine learning, e.g., in terms of the reproducibility of the results or the choice of the baselines when proposing new models.In this work, we report the results of a systematic analysis of algorithmic proposals for top-n recommendation tasks. Specifically, we considered 18 algorithms that were presented at top-level research conferences in the last years. Only 7 of them could be reproduced with reasonable effort. For these methods, it however turned out that 6 of them can often be outperformed with comparably simple heuristic methods, e.g., based on nearest-neighbor or graph-based techniques. The remaining one clearly outperformed the baselines but did not consistently outperform a well-tuned nonneural linear ranking method. Overall, our work sheds light on a number of potential problems in today's machine learning scholarship and calls for improved scientific practices in this area.

show abstract

Section: Research Methods 21 Collecting Reproducible Papersmentioning

confidence: 99%

“…Precisely speaking, we used a mix of replication and reproduction[12,35], i.e., we used both artifacts provided by the authors and our own artifacts. For the sake of readability, we will only use the term "reproducibility" in this paper 3. We did not apply modifications to the core algorithms.…”

mentioning

confidence: 99%

Are we really making much progress? A worrying analysis of recent neural recommendation approaches

Dacrema

Cremonesi

Jannach

2019

Proceedings of the 13th ACM Conference on Recommender Systems

429

166

View full text Add to dashboard Cite

show abstract

“…In the long-run, we hope to provide a platform to the information retrieval, digital library, and recommender systems community that helps conducting more reproducible and robust research in real-world scenarios [34,35]. To achieve this, we plan to add more partners on both sidesplatform partners who provide access to real users, and research partners who evaluate their novel algorithms via the living lab.…”

Section: Future Workmentioning

confidence: 99%

Online Evaluations for Everyone: Mr. DLib’s Living Lab for Scholarly Recommendations

Beel

Collins

Kopp

et al. 2019

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

We introduce the first 'living lab' for scholarly recommender systems. This lab allows recommender-system researchers to conduct online evaluations of their novel algorithms for scholarly recommendations, i.e., recommendations for research papers, citations, conferences, research grants, etc. Recommendations are delivered through the living lab's API to platforms such as reference management software and digital libraries. The living lab is built on top of the recommender-system as-a-service Mr. DLib. Current partners are the reference management software JabRef and the CORE research team. We present the architecture of Mr. DLib's living lab as well as usage statistics on the first sixteen months of operating it. During this time, 1,826,643 recommendations were delivered with an average click-through rate of 0.21%.

show abstract

“…Unfortunately, university-based researchers struggle unless they closely collaborate with industry (e.g., [7]) or develop their own infrastructure and user base (e.g., [1]). Without online testing opportunities open to the research communities, they cannot employ online evaluation on a larger scale, which is the de-facto standard evaluation methodology in industry.…”

Section: Research Challengesmentioning

confidence: 99%

A Stream-based Resource for Multi-Dimensional Evaluation of Recommender Algorithms

Kille

Lommatzsch

Hopfgartner

et al. 2017

Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

Self Cite

View full text Add to dashboard Cite

Recommender System research has evolved to focus on developing algorithms capable of high performance in online systems. is development calls for a new evaluation infrastructure that supports multi-dimensional evaluation of recommender systems. Today's researchers should analyze algorithms with respect to a variety of aspects including predictive performance and scalability. Researchers need to subject algorithms to realistic conditions in online A/B tests. We introduce two resources supporting such evaluation methodologies: the new data set of stream recommendation interactions released for CLEF NewsREEL 2017, and the new Open Recommendation Platform (ORP). e data set allows researchers to study a stream recommendation problem closely by "replaying" it locally, and ORP makes it possible to take this evaluation "live" in a living lab scenario. Speci cally, ORP allows researchers to deploy their algorithms in a live stream to carry out A/B tests. To our knowledge, NewsREEL is the rst online news recommender system resource to be put at the disposal of the research community. In order to encourage others to develop comparable resources for a wide range of domains, we present a list of practical lessons learned in the development of the dataset and ORP.

show abstract

Towards reproducibility in recommender-systems research

Cited by 61 publications

References 81 publications

Are we really making much progress? A worrying analysis of recent neural recommendation approaches

Are we really making much progress? A worrying analysis of recent neural recommendation approaches

Online Evaluations for Everyone: Mr. DLib’s Living Lab for Scholarly Recommendations

A Stream-based Resource for Multi-Dimensional Evaluation of Recommender Algorithms

Contact Info

Product

Resources

About