SummaryObjective: To define the key concepts which inform whether a system for collecting, aggregating and processing routine clinical data for research is fit for purpose. Methods: Literature review and shared experiential learning from research using routinely collected data. We excluded socio-cultural issues, and privacy and security issues as our focus was to explore linking clinical data. Results: Six key concepts describe data: (1) Data quality: the core Overarching concept -Are these data fit for purpose? (2) Data provenance: defined as how data came to be; incorporating the concepts of lineage and pedigree. Mapping this process requires metadata. New variables derived during data analysis have their own provenance. (3) Data extraction errors and (4) Data processing errors, which are the responsibility of the investigator extracting the data but need quantifying. (5) Traceability: the capability to identify the origins of any data cell within the final analysis table essential for good governance, and almost impossible without a formal system of metadata; and (6) Curation: storing data and look-up tables in a way that allows future researchers to carry out further research or review earlier findings. Conclusion: There are common distinct steps in processing data; the quality of any metadata may be predictive of the quality of the process. Outputs based on routine data should include a review of the process from data origin to curation and publish information about their data provenance and processing method.
SummaryObjectives: To perform a requirements analysis of the barriers to conducting research linking of primary care, genetic and cancer data. Methods: We extended our initial data-centric approach to include socio-cultural and business requirements. We created reference models of core data requirements common to most studies using unified modelling language (UML), dataflow diagrams (DFD) and business process modelling notation (BPMN). We conducted a stakeholder analysis and constructed DFD and UML diagrams for use cases based on simulated research studies. We used research output as a sensitivity analysis. Results: Differences between the reference model and use cases identified study specific data requirements. The stakeholder analysis identified: tensions, changes in specification, some indifference from data providers and enthusiastic informaticians urging inclusion of socio-cultural context. We identified requirements to collect information at three levels: micro-data items, which need to be semantically interoperable, meso-the medical record and data extraction, and macro-the health system and socio-cultural issues. BPMN clarified complex business requirements among data providers and vendors; and additional geographical requirements for patients to be represented in both linked datasets. High quality research output was the norm for most repositories. Conclusions: Reference models provide high-level schemata of the core data requirements. However, business requirements' modelling identifies stakeholder issues and identifies what needs to be addressed to enable participation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.