Context: As the COVID-19 pandemic persists around the world, the scientific community continues to produce and circulate knowledge on the deadly disease at an unprecedented rate. During the early stage of the pandemic, preprints represented nearly 40% of all English-language COVID-19 scientific corpus (6, 000+ preprints | 16, 000+ articles). As of mid-August 2020, that proportion dropped to around 28% (13, 000+ preprints | 49, 000+ articles). Nevertheless, preprint servers remain a key engine in the efficient dissemination of scientific work on this infectious disease. But, giving the uncertified nature of the scientific manuscripts curated on preprint repositories, their integration to the global ecosystem of scientific communication is not without creating serious tensions. This is especially the case for biomedical knowledge since the dissemination of bad science can have widespread societal consequences.
Scope: In this paper, I propose a robust method that will allow the repeated monitoring and measuring of COVID-19 preprint's publication rate. I also introduce a new API called Upload-or-Perish. It is a micro-API service that enables a client to query a specific preprint manuscript's publication status and associated meta-data using a unique ID. This tool is in active development.
Data: I use Covid-19 Open Research Dataset (CORD-19) to calculate COVID-19 preprint corpus' conversion rate to peer-reviewed articles. CORD-19 dataset includes preprints from arXiv, bioRxiv, and medRxiv.
Methods: I utilize conditional fuzzy logic on article titles to determine if a preprint has a published counterpart version in the database. My approach is an important departure from previous studies that rely exclusively on bioRxiv API to ascertain preprints' publication status. This is problematic since the level of false positives in bioRxiv metadata could be as high as 37%.
Findings: My analysis reveals that around 15% of COVID-19 preprint manuscripts in CORD-19 dataset that were uploaded on from arXiv, bioRxiv, and medRxiv between January and early August 2020 were published in a peer-reviewed venue. When compared to the most recent measure available, this represents a two-fold increase in a period of two months. My discussion review and theorize on the potential explanations for COVID-19 preprints' low conversion rate.