While data quality is recognized as a critical aspect in establishing and utilizing a CDRN, the findings from data quality assessments are largely unpublished. This paper presents a real-world account of studying and interpreting data quality findings in a pediatric CDRN, and the lessons learned could be used by other CDRNs.
Biomedical researchers share a common challenge of making complex data understandable and accessible as they seek inherent relationships between attributes in disparate data types. Data discovery in this context is limited by a lack of query systems that efficiently show relationships between individual variables, but without the need to navigate underlying data models. We have addressed this need by developing Harvest, an open-source framework of modular components, and using it for the rapid development and deployment of custom data discovery software applications. Harvest incorporates visualizations of highly dimensional data in a web-based interface that promotes rapid exploration and export of any type of biomedical information, without exposing researchers to underlying data models. We evaluated Harvest with two cases: clinical data from pediatric cardiology and demonstration data from the OpenMRS project. Harvest's architecture and public open-source code offer a set of rapid application development tools to build data discovery applications for domain-specific biomedical data repositories. All resources, including the OpenMRS demonstration, can be found at http://harvest.research.chop.edu
The use of “big data” for pediatric hearing research requires new approaches to both data collection and research methods. The widespread deployment of electronic health record systems creates new opportunities and corresponding challenges in the secondary use of large volumes of audiological and medical data. Opportunities include cost-effective hypothesis generation, rapid cohort expansion for rare conditions, and observational studies based on sample sizes in the thousands to tens of thousands. Challenges include finding and forming appropriately skilled teams, access to data, data quality assessment, and engagement with a research community new to big data. The authors share their experience and perspective on the work required to build and validate a pediatric hearing research database that integrates clinical data for over 185,000 patients from the electronic health record systems of three major academic medical centers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.