Objective: To provide high-quality data for COVID-19 research, we validated derived COVID-19 clinical indicators and 22 associated machine learning phenotypes, in the Mass General Brigham (MGB) COVID-19 Data Mart.
Materials and Methods: Fifteen reviewers performed a retrospective manual chart review for 150 COVID-19 positive patients in the data mart. To support rapid chart review for a wide range of target data, we offered a Natural Language Processing (NLP)-based chart review tool, the Digital Analytic Patient Reviewer (DAPR). For this work, we designed a dedicated patient summary view and developed new 127 NLP logics to extract COVID-19 relevant medical concepts and target phenotypes. Moreover, we transformed DAPR for research purposes, so that patient information is used for an approved research purpose only and enabled fast access to the integrated patient information. Lastly, we performed a survey to evaluate the validation difficulty and usefulness of the DAPR.
Results: The concepts for COVID-19 positive cohort, COVID-19 index date, COVID-19 related admission, and the admission date were shown to have high values in all evaluation metrics. However, three phenotypes showed notable performance degradation than the Positive Predictive Value (PPV) in the pre-pandemic population. Based on these results, we removed the three phenotypes from our data mart. In the survey about using the tool, participants expressed positive attitudes towards using DAPR for chart review. They assessed the validation was easy and DAPR helped find relevant information. Some validation difficulties were also discussed.
Discussion and Conclusion: Use of NLP technology in the chart review helped to cope with the challenges of the COVID-19 data validation task and accelerated the process. As a result, we could provide more reliable research data promptly and respond to the COVID-19 crisis. DAPR’s benefit can be expanded to other domains. We plan to operationalize it for wider research groups.