The pandemic caused by the SARS-CoV-2 virus has generated numerous scientific texts on the subject, including symptoms, drugs, treatments, and vaccines. These documents are written in natural language and lack a computer-processable structure, which causes a tedious and time-consuming manual analysis. Thus, Information Retrieval (IR) approaches are increasingly necessary to analyse Spanish scientific texts. Therefore, this paper presents approaches based on text analysis levels to retrieve relevant documents about COVID-19 from the Spanish literature. The approaches include preprocessing scientific texts, recovering relevant documents using text analysis levels (probabilistic, similarity, and semantic), and an evaluation process. The main aim of the approaches is to rank the scientific documents considering their relevance to the input question. The evaluation task is done with 100 COVID-19 questions and a dataset of 249,000 scientific texts in Spanish. The results show that the probabilistic approach achieved an F-measure of 85%, supported by document filtering using the Latent Dirichlet Allocation (LDA) algorithm.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.