We are proposing a new similarity based recommendation system for large-scale dynamic marketplaces. Our solution consists of an offline process, which generates long-term cluster definitions grouping short-lived item listings, and an online system, which utilizes these clusters to first focus on important similarity dimensions and next conducts a trade-off between further similarity and other quality factors such as seller trustworthiness. Our system generates these clusters from several hundred millions of item listings using a large Hadoop map-reduce based system. The clusters are learned using user queries as the main information source and therefore biased towards how users conceptually group items. Our system is deployed on several eBay sites in large-scale and has increased user-engagement and business metrics compared to the previous system. We show that utilizing user queries helps capturing similarity better. We also present experiments demonstrating that adapting the ranking function, which controls the trade-off between similarity and quality, to a specific context improves recommendation performance.
Computational approaches to generate hypotheses from biomedical literature have been studied intensively in recent years. Nevertheless, it still remains a challenge to automatically discover novel, cross-silo biomedical hypotheses from large-scale literature repositories. In order to address this challenge, we first model a biomedical literature repository as a comprehensive network of biomedical concepts and formulate hypotheses generation as a process of link discovery on the concept network. We extract the relevant information from the biomedical literature corpus and generate a concept network and concept-author map on a cluster using Map-Reduce frame-work. We extract a set of heterogeneous features such as random walk based features, neighborhood features and common author features. The potential number of links to consider for the possibility of link discovery is large in our concept network and to address the scalability problem, the features from a concept network are extracted using a cluster with Map-Reduce framework. We further model link discovery as a classification problem carried out on a training data set automatically extracted from two network snapshots taken in two consecutive time duration. A set of heterogeneous features, which cover both topological and semantic features derived from the concept network, have been studied with respect to their impacts on the accuracy of the proposed supervised link discovery process. A case study of hypotheses generation based on the proposed method has been presented in the paper.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.