SummaryObjectives: To provide an overview of the benefits of clinical data collected as a by-product of the care process, the potential problems with large aggregations of these data, the policy frameworks that have been formulated, and the major challenges in the coming years. Methods: This report summarizes some of the major observations from AMIA and IMIA conferences held on this admittedly broad topic from 2006 through 2013. This report also includes many unsupported opinions of the author. Results: The benefits of aggregating larger and larger sets of routinely collected clinical data are well documented and of great societal benefit. These large data sets will probably never answer all possible clinical questions for methodological reasons. Non-traditional sources of health data that are patient-sources will pose new data science challenges. Conclusions: If we ever hope to have tools that can rapidly provide evidence for daily practice of medicine we need a science of health data perhaps modeled after the science of astronomy. As a byproduct of a patient's care, vast quantities of information are stored in electronic databases. The primary reason for collecting this information is to support the care of the patient during an encounter or subsequent encounters. For the purpose of this review, all other uses of a particular patient's data not for that patient's care will be considered reuse. The reuse of patient data for quality assurance and clinical research is not new, but in the context of "big data", has new importance both for the prospect of refining the "evidence" that we base medical decisions upon as well as the potential for gaining new insights in the era of personalized medicine. This review will highlight some of the benefits of reuse, the potential problems with large clinical databases, the policy frameworks that have been formulated, and the major challenges in the coming years.
KeywordsThe volume and availability of health data has increased primarily for two reasons -the mandated adoption of data exchange standards and the variety of types and sources of data. The stimulus to adopt information technology in healthcare is driven by the belief that it can help control costs as well as improve the safety of care. Demonstrating the improvement in the quality or safety of care is much easier than proving that health information technology saves money. For instance, while automation in the clinical laboratories has improved efficiency, in our hospital we have re-purposed personnel to perform other tasks. In the United States, these drivers are embodied in the HITECH act of 2009 that provides incentives for hospitals and physicians to adopt electronic health records (EHRs) that are interoperable. The trends are similar worldwide.In three decades (1983 to 2013), the data storage needs of our hospital has increased by about six orders of magnitude -from two gigabytes to approximately two petabytes of data. Although our hospital has merged with another hospital and our EHR now captures all clinical ...