ObjectivesTo objectively evaluate freely available data profiling software tools using healthcare data.DesignData profiling tools were evaluated for their capabilities using publicly available information and data sheets. From initial assessment, several underwent further detailed evaluation for application on healthcare data using a synthetic dataset of 1000 patients and associated data using a common health data model, and tools scored based on their functionality with this dataset.SettingImproving the quality of healthcare data for research use is a priority. Profiling tools can assist by evaluating datasets across a range of quality dimensions. Several freely available software packages with profiling capabilities are available but healthcare organisations often have limited data engineering capability and expertise.Participants28 profiling tools, 8 undergoing evaluation on synthetic dataset of 1000 patients.ResultsOf 28 potential profiling tools initially identified, 8 showed high potential for applicability with healthcare datasets based on available documentation, of which two performed consistently well for these purposes across multiple tasks including determination of completeness, consistency, uniqueness, validity, accuracy and provision of distribution metrics.ConclusionsNumerous freely available profiling tools are serviceable for potential use with health datasets, of which at least two demonstrated high performance across a range of technical data quality dimensions based on testing with synthetic health dataset and common data model. The appropriate tool choice depends on factors including underlying organisational infrastructure, level of data engineering and coding expertise, but there are freely available tools helping profile health datasets for research use and inform curation activity.
ObjectivesThe value of healthcare data is being increasingly recognised, including the need to improve health dataset utility. There is no established mechanism for evaluating healthcare dataset utility making it difficult to evaluate the effectiveness of activities improving the data. To describe the method for generating and involving the user community in developing a proposed framework for evaluation and communication of healthcare dataset utility for given research areas.MethodsAninitial version of a matrix to review datasets across a range of dimensions wasdeveloped based on previous published findings regarding healthcare data. Thiswas used to initiate a design process through interviews and surveys with datausers representing a broad range of user types and use cases, to help develop afocused framework for characterising datasets.ResultsFollowing 21 interviews, 31 survey responses and testing on 43 datasets, five major categories and 13 subcategories were identified as useful for a dataset, including Data Model, Completeness and Linkage. Each sub-category was graded to facilitate rapid and reproducible evaluation of dataset utility for specific use-cases. Testing of applicability to >40 existing datasets demonstrated potential usefulness for subsequent evaluation in real-world practice.DiscussionTheresearch has developed an evidenced-based initial approach for a framework tounderstand the utility of a healthcare dataset. It likely to require further refinementfollowing wider application and additional categories may be required.ConclusionThe process has resulted in a user-centred designed framework for objectively evaluating the likely utility of specific healthcare datasets, and therefore, should be of value both for potential users of health data, and for data custodians to identify the areas to provide the optimal value for data curation investment.
Background: Numerous clinical studies are now underway investigating aspects of COVID-19. The aim of this study was to identify a selection of national and/or multicentre clinical COVID-19 studies in the United Kingdom to examine the feasibility and outcomes of documenting the most frequent data elements common across studies to rapidly inform future study design and demonstrate proof-of-concept for further subject-specific study data element mapping to improve research data management. Methods: 25 COVID-19 studies were included. For each, information regarding the specific data elements being collected was recorded. Data elements collated were arbitrarily divided into categories for ease of visualisation. Elements which were most frequently and consistently recorded across studies are presented in relation to their relative commonality. Results: Across the 25 studies, 261 data elements were recorded in total. The most frequently recorded 100 data elements were identified across all studies and are presented with relative frequencies. Categories with the largest numbers of common elements included demographics, admission criteria, medical history and investigations. Mortality and need for specific respiratory support were the most common outcome measures, but with specific studies including a range of other outcome measures. Conclusion: The findings of this study have demonstrated that it is feasible to collate specific data elements recorded across a range of studies investigating a specific clinical condition in order to identify those elements which are most common among studies. These data may be of value for those establishing new studies and to allow researchers to rapidly identify studies collecting data of potential use hence minimising duplication and increasing data re-use and interoperability
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.