The adoption of electronic health records (EHRs) has made patient data increasingly accessible, precipitating the development of various clinical decision support systems and data-driven models to help physicians. However, missing data are common in EHR-derived datasets, which can introduce significant uncertainty, if not invalidating the use of a predictive model. Machine learning (ML)-based imputation methods have shown promise in various domains for the task of estimating values and reducing uncertainty to the point that a predictive model can be employed. We introduce Autopopulus, a novel framework that enables the design and evaluation of various autoencoder architectures for efficient imputation on large datasets. Autopopulus implements existing autoencoder methods as well as a new technique that outputs a range of estimated values (rather than point estimates), and demonstrates a workflow that helps users make an informed decision on an appropriate imputation method. To further illustrate Autopopulus' utility, we use it to identify not only which imputation methods can most accurately impute on a large clinical dataset, but to also identify the imputation methods that enable downstream predictive models to achieve the best performance for prediction of chronic kidney disease (CKD) progression.Clinical relevance-Enable investigation of autoencoders for imputation of large clinical datasets, and investigate the impact of imputation on downstream tasks instead of in isolation.
The populations impacted most by COVID are also impacted by racism and related social stigma; however, traditional surveillance tools may not capture the intersectionality of these relationships. We conducted a detailed assessment of diverse surveillance systems and databases to identify characteristics, constraints and best practices that might inform the development of a novel COVID surveillance system that achieves these aims. We used subject area expertise, an expert panel and CDC guidance to generate an initial list of N > 50 existing surveillance systems as of 29 October 2020, and systematically excluded those not advancing the project aims. This yielded a final reduced group (n = 10) of COVID surveillance systems (n = 3), other public health systems (4) and systems tracking racism and/or social stigma (n = 3, which we evaluated by using CDC evaluation criteria and Critical Race Theory. Overall, the most important contribution of COVID-19 surveillance systems is their real-time (e.g., daily) or near-real-time (e.g., weekly) reporting; however, they are severely constrained by the lack of complete data on race/ethnicity, making it difficult to monitor racial/ethnic inequities. Other public health systems have validated measures of psychosocial and behavioral factors and some racism or stigma-related factors but lack the timeliness needed in a pandemic. Systems that monitor racism report historical data on, for instance, hate crimes, but do not capture current patterns, and it is unclear how representativeness the findings are. Though existing surveillance systems offer important strengths for monitoring health conditions or racism and related stigma, new surveillance strategies are needed to monitor their intersecting relationships more rigorously.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.