Background The COVID-19 global health crisis has led to an exponential surge in published scientific literature. In an attempt to tackle the pandemic, extremely large COVID-19–related corpora are being created, sometimes with inaccurate information, which is no longer at scale of human analyses. Objective In the context of searching for scientific evidence in the deluge of COVID-19–related literature, we present an information retrieval methodology for effective identification of relevant sources to answer biomedical queries posed using natural language. Methods Our multistage retrieval methodology combines probabilistic weighting models and reranking algorithms based on deep neural architectures to boost the ranking of relevant documents. Similarity of COVID-19 queries is compared to documents, and a series of postprocessing methods is applied to the initial ranking list to improve the match between the query and the biomedical information source and boost the position of relevant documents. Results The methodology was evaluated in the context of the TREC-COVID challenge, achieving competitive results with the top-ranking teams participating in the competition. Particularly, the combination of bag-of-words and deep neural language models significantly outperformed an Okapi Best Match 25–based baseline, retrieving on average, 83% of relevant documents in the top 20. Conclusions These results indicate that multistage retrieval supported by deep learning could enhance identification of literature for COVID-19–related questions posed using natural language.
The prediction of chemical reaction pathways has been accelerated by the development of novel machine learning architectures based on the deep learning paradigm. In this context, deep neural networks initially designed for language translation have been used to accurately predict a wide range of chemical reactions. Among models suited for the task of language translation, the recently introduced molecular transformer reached impressive performance in terms of forward-synthesis and retrosynthesis predictions. In this study, we first present an analysis of the performance of transformer models for product, reactant, and reagent prediction tasks under different scenarios of data availability and data augmentation. We find that the impact of data augmentation depends on the prediction task and on the metric used to evaluate the model performance. Second, we probe the contribution of different combinations of input formats, tokenization schemes, and embedding strategies to model performance. We find that less stable input settings generally lead to better performance. Lastly, we validate the superiority of round-trip accuracy over simpler evaluation metrics, such as top-k accuracy, using a committee of human experts and show a strong agreement for predictions that pass the round-trip test. This demonstrates the usefulness of more elaborate metrics in complex predictive scenarios and highlights the limitations of direct comparisons to a predefined database, which may include a limited number of chemical reaction pathways.
<b><i>Introduction:</i></b> The SARS-CoV-2 pandemic has led to one of the most critical and boundless waves of publications in the history of modern science. The necessity to find and pursue relevant information and quantify its quality is broadly acknowledged. Modern information retrieval techniques combined with artificial intelligence (AI) appear as one of the key strategies for COVID-19 living evidence management. Nevertheless, most AI projects that retrieve COVID-19 literature still require manual tasks. <b><i>Methods:</i></b> In this context, we present a novel, automated search platform, called Risklick AI, which aims to automatically gather COVID-19 scientific evidence and enables scientists, policy makers, and healthcare professionals to find the most relevant information tailored to their question of interest in real time. <b><i>Results:</i></b> Here, we compare the capacity of Risklick AI to find COVID-19-related clinical trials and scientific publications in comparison with clinicaltrials.gov and PubMed in the field of pharmacology and clinical intervention. <b><i>Discussion:</i></b> The results demonstrate that Risklick AI is able to find COVID-19 references more effectively, both in terms of precision and recall, compared to the baseline platforms. Hence, Risklick AI could become a useful alternative assistant to scientists fighting the COVID-19 pandemic.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.