The number of biomedical research articles published has doubled in the past 20 years. Search engine based systems naturally center around searching, but researchers may not have a clear goal in mind, or the goal may be expressed in a query that a literature search engine cannot easily answer, such as identifying the most prominent authors in a given field of research. The discovery process can be improved by providing researchers with recommendations for relevant papers or for researchers who are dealing with related bodies of work. In this paper we describe several recommendation algorithms that were implemented in the Meta platform. The Meta platform contains over 27 million articles and continues to grow daily. It provides an online map of science that organizes, in real time, all published biomedical research. The ultimate goal is to make it quicker and easier for researchers to: (a) filter through scientific papers, (b) find the most important work, and (c) keep up with emerging research results. Meta generates and maintains a semantic knowledge network consisting of five different core entities: authors, papers, journals, institutions, and concepts (fields). As papers are published, the Meta data science platform detects, disambiguates and organizes the mentions of the core entities in a given paper thereby integrating new papers into its knowledge network. We implemented several recommendation algorithms and evaluated their efficiency in this large-scale biomedical knowledge base. We selected recommendation algorithms that could take advantage of the unique environment of the Meta platform such as those that make use of diverse datasets such as a citation networks, text content, semantic tag content, and co-authorship information and those that can scale to very large datasets. In this paper, we describe the recommendation algorithms that were implemented and report on their relative efficiency and the challenges associated with developing and deploying a production recommendation engine system.
Related WorkMajor online scientific databases that are currently in use by biomedical researchers are PubMed, Google Scholar (GS), Web of Science (WoS), Scopus, Microsoft Academic (MA), Semantic Scholar (S2), and Meta. PubMed is a free online resource developed and maintained by the National Centre for Biotechnology Information (NCBI) in the United States (Canese & Weis, 2013; NCBI, 2017). It comprises over 27 million references from the MEDLINE database, in addition to other life science journals and online books (NIH, 2017). PubMed is mostly focused on medicine and biomedical literature whereas the other resources described below include various scientific fields (Falagas et al., 2008). It provides search filters that help trim the search results to a specific clinical study or specific topic. It also provides approximately 50 search fields and tags (e.g., first author name, publisher, title, etc.) (NCBI, 2017). Search results in PubMed can be sorted based on different criteria such as publication date or releva...