This paper describes a text mining approach that utilises the PyLucene search engine and the GrapeNLP grammar engine for extracting links between temperature, humidity and the spread of COVID-19, from a vast collection of scientific publications. The approach was developed in response to a Kaggle challenge from a consortium of research groups to develop text and data mining techniques that can assist the medical community in finding answers to a series of important questions on COVID-19. For this challenge, a large corpus of scientific publications known as the COVID-19 Open Research Dataset (CORD-19) was provided by the consortium. The approach presented in this paper was winner of the competition task of extracting key insights and building summary tables of COVID-19 relevant factors such as temperature and humidity.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.