Background
The Complete Blood Count (CBC) is a commonly used low-cost test that measures white blood cells, red blood cells, and platelets in a person’s blood. It is a useful tool to support medical decisions, as intrinsic variations of each analyte bring relevant insights regarding potential diseases. In this study, we aimed at developing machine learning models for COVID-19 diagnosis through CBCs, unlocking the predictive power of non-linear relationships between multiple blood analytes.
Methods
We collected 809,254 CBCs and 1,088,385 RT-PCR tests for SARS-Cov-2, of which 21% (234,466) were positive, from 900,220 unique individuals. To properly screen COVID-19, we also collected 120,807 CBCs of 16,940 individuals who tested positive for other respiratory viruses. We proposed an ensemble procedure that combines machine learning models for different respiratory infections and analyzed the results in both the first and second waves of COVID-19 cases in Brazil.
Results
We obtain a high-performance AUROC of 90 + % for validations in both scenarios. We show that models built solely of SARS-Cov-2 data are biased, performing poorly in the presence of infections due to other RNA respiratory viruses.
Conclusions
We demonstrate the potential of a novel machine learning approach for COVID-19 diagnosis based on a CBC and show that aggregating information about other respiratory diseases was essential to guarantee robustness in the results. Given its versatile nature, low cost, and speed, we believe that our tool can be particularly useful in a variety of scenarios—both during the pandemic and after.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.