Meaningful Integration of Data, Analytics and Services of Computer-Based Medical Systems: The MIDAS Touch

Black, Michaela; Wallace, Jonathan; Rankin, Debbie; Carlin, Paul; Bond, Raymond; Mulvenna, Maurice; Cleland, Brian; Fischaber, Scott; Epelde, Gorka; Nikolić, Gorana; Pajula, Juha; Connolly, Regina

doi:10.1109/cbms.2019.00031

Cited by 9 publications

(6 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…From a health care perspective, a range of technical solutions using state-of-the-art machine learning could be developed using health care data with the potential to derive knowledge that can inform and enhance health care policy decision making and risk stratification [ 36 , 48 ]. Such tools can have a positive impact on health policy and practice, meeting the aims of national health departments, for example, as stated by the Department of Health Permanent Secretary in Northern Ireland, Richard Pengelly, in support of the MIDAS project, “the Department seeks to improve the health and social wellbeing of the people of NI, reduce health inequalities, and to assure the provision of appropriate health and social care services in clinical settings and in the community.”…”

Section: Discussionmentioning

confidence: 99%

“…While synthetic data have been used to accelerate and democratize business and economic policy research [ 22 - 35 ], the process is not currently in use for health care research, an area that could benefit enormously. With advancements in technology, particularly machine learning and artificial intelligence (AI), the potential to develop diagnostic tools for clinicians and data driven decision-making platforms for health policy-makers is ever increasing [ 36 , 37 ]. Such tools require access to health care data, for example, to train AI algorithms and produce models that can identify health conditions and health-related patterns across the population.…”

Section: Introductionmentioning

confidence: 99%

“…Currently, it can take a lengthy period of time for researchers to gain access to health care data, a rich and underused resource, due to privacy concerns [ 38 - 42 ]. For example, in the case of the 40-month Meaningful Integration of Data, Analytics, and Services (MIDAS) Project [ 36 , 43 ] developing a data-driven decision-making tool for health care policy makers, it took more than 20 months to obtain access to the required data due to legal and ethical constraints. In addition, a number of important data variables could not made available, which restricted the utility of the platform under development.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Reliability of Supervised Machine Learning Using Synthetic Data in Health Care: Model to Preserve Privacy for Data Sharing

Rankin¹,

Black²,

Bond³

et al. 2020

JMIR Med Inform

Self Cite

112

View full text Add to dashboard Cite

Background The exploitation of synthetic data in health care is at an early stage. Synthetic data could unlock the potential within health care datasets that are too sensitive for release. Several synthetic data generators have been developed to date; however, studies evaluating their efficacy and generalizability are scarce. Objective This work sets out to understand the difference in performance of supervised machine learning models trained on synthetic data compared with those trained on real data. Methods A total of 19 open health datasets were selected for experimental work. Synthetic data were generated using three synthetic data generators that apply classification and regression trees, parametric, and Bayesian network approaches. Real and synthetic data were used (separately) to train five supervised machine learning models: stochastic gradient descent, decision tree, k-nearest neighbors, random forest, and support vector machine. Models were tested only on real data to determine whether a model developed by training on synthetic data can used to accurately classify new, real examples. The impact of statistical disclosure control on model performance was also assessed. Results A total of 92% of models trained on synthetic data have lower accuracy than those trained on real data. Tree-based models trained on synthetic data have deviations in accuracy from models trained on real data of 0.177 (18%) to 0.193 (19%), while other models have lower deviations of 0.058 (6%) to 0.072 (7%). The winning classifier when trained and tested on real data versus models trained on synthetic data and tested on real data is the same in 26% (5/19) of cases for classification and regression tree and parametric synthetic data and in 21% (4/19) of cases for Bayesian network-generated synthetic data. Tree-based models perform best with real data and are the winning classifier in 95% (18/19) of cases. This is not the case for models trained on synthetic data. When tree-based models are not considered, the winning classifier for real and synthetic data is matched in 74% (14/19), 53% (10/19), and 68% (13/19) of cases for classification and regression tree, parametric, and Bayesian network synthetic data, respectively. Statistical disclosure control methods did not have a notable impact on data utility. Conclusions The results of this study are promising with small decreases in accuracy observed in models trained with synthetic data compared with models trained with real data, where both are tested on real data. Such deviations are expected and manageable. Tree-based classifiers have some sensitivity to synthetic data, and the underlying cause requires further investigation. This study highlights the potential of synthetic data and the need for further evaluation of their robustness. Synthetic data must ensure individual privacy and data utility are preserved in order to instill confidence in health care departments when using such data to inform policy decision-making.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Reliability of Supervised Machine Learning Using Synthetic Data in Health Care: Model to Preserve Privacy for Data Sharing

Rankin¹,

Black²,

Bond³

et al. 2020

JMIR Med Inform

Self Cite

112

View full text Add to dashboard Cite

show abstract

“…In the context of the meaningful integration and usage of data, the EU H2020 project MIDAS (Meaningful Integration of Data, Analytics and Services) [29] is developing a big data platform that facilitates the utilisation of healthcare data beyond the existing isolated systems, making that data available for enrichment with open data. This data fusion approach thus enables evidence-based health policy decision making, and potentially may lead to significant improvements in healthcare and quality of life for all citizens [4].…”

Section: Motivationmentioning

confidence: 99%

NewsMeSH: A new classifier designed to annotate health news with MeSH headings

Costa

Rei

Stopar

et al. 2021

Artificial Intelligence in Medicine

Self Cite

View full text Add to dashboard Cite

“…The novel MIDAS public health platform [21], presented in this paper and shown in Figure 1, goes a step beyond existing platforms, particularly in responding to the coronavirus pandemic, by providing its users in public health authorities with insightful information from a combination of sources including world news, social media and published science, alongside local public health data from the health institution itself and other relevant data sources. The MIDAS platform was co-created with academia, industry, and crucially, health professionals, policy-makers, public health authorities and citizens, to align innovative technology with concrete public health priorities and workflows [4]. It was developed to connect typically heterogeneous, isolated health data, and integrate it with additional social data sources, to enable the application of advanced data analytics techniques and visual analytics tools to support policy decision-making in public health institutes across Europe [7].…”

mentioning

confidence: 99%

Meaningful Big Data Integration for a Global COVID-19 Strategy

Costa

Grobelnik

Fuart

et al. 2020

IEEE Comput. Intell. Mag.

Self Cite

View full text Add to dashboard Cite

With the rapid spread of the COVID-19 pandemic, the novel Meaningful Integration of Data Analytics and Services (MIDAS) platform quickly demonstrates its value, relevance and transferability to this new global crisis. The MIDAS platform enables the connection of a large number of isolated heterogeneous data sources, and combines rich datasets including open and social data, ingesting and preparing these for the application of analytics, monitoring and research tools. These platforms will assist public health authorities in: (i) better understanding the disease and its impact; (ii) monitoring the different aspects of the evolution of the pandemic across a diverse range of groups; (iii) contributing to improved resilience against the impacts of this global crisis; and (iv) enhancing preparedness for future public health emergencies. The model of governance and ethical review, incorporated and defined within MIDAS, also addresses the complex privacy and ethical issues that the developing pandemic has highlighted, allowing oversight and scrutiny of more and richer data sources by users of the system.

show abstract

Meaningful Integration of Data, Analytics and Services of Computer-Based Medical Systems: The MIDAS Touch

Cited by 9 publications

References 4 publications

Reliability of Supervised Machine Learning Using Synthetic Data in Health Care: Model to Preserve Privacy for Data Sharing

Reliability of Supervised Machine Learning Using Synthetic Data in Health Care: Model to Preserve Privacy for Data Sharing

NewsMeSH: A new classifier designed to annotate health news with MeSH headings

Meaningful Big Data Integration for a Global COVID-19 Strategy

Contact Info

Product

Resources

About