2020
DOI: 10.2196/16492
|View full text |Cite
|
Sign up to set email alerts
|

Analyzing Medical Research Results Based on Synthetic Data and Their Relation to Real Data Results: Systematic Comparison From Five Observational Studies

Abstract: Background Privacy restrictions limit access to protected patient-derived health information for research purposes. Consequently, data anonymization is required to allow researchers data access for initial analysis before granting institutional review board approval. A system installed and activated at our institution enables synthetic data generation that mimics data from real electronic medical records, wherein only fictitious patients are listed. Objective … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
63
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
8
1

Relationship

1
8

Authors

Journals

citations
Cited by 98 publications
(64 citation statements)
references
References 57 publications
1
63
0
Order By: Relevance
“…From a health care perspective, a range of technical solutions using state-of-the-art machine learning could be developed using health care data with the potential to derive knowledge that can inform and enhance health care policy decision making and risk stratification [ 36 , 48 ]. Such tools can have a positive impact on health policy and practice, meeting the aims of national health departments, for example, as stated by the Department of Health Permanent Secretary in Northern Ireland, Richard Pengelly, in support of the MIDAS project, “the Department seeks to improve the health and social wellbeing of the people of NI, reduce health inequalities, and to assure the provision of appropriate health and social care services in clinical settings and in the community.”…”
Section: Discussionmentioning
confidence: 99%
“…From a health care perspective, a range of technical solutions using state-of-the-art machine learning could be developed using health care data with the potential to derive knowledge that can inform and enhance health care policy decision making and risk stratification [ 36 , 48 ]. Such tools can have a positive impact on health policy and practice, meeting the aims of national health departments, for example, as stated by the Department of Health Permanent Secretary in Northern Ireland, Richard Pengelly, in support of the MIDAS project, “the Department seeks to improve the health and social wellbeing of the people of NI, reduce health inequalities, and to assure the provision of appropriate health and social care services in clinical settings and in the community.”…”
Section: Discussionmentioning
confidence: 99%
“…The challenges of building meaningful cohorts and protecting privacy are interconnected and cannot be separated for the type of next-generation research we conducted. 7–10 , 16 , 17 Many current data-anonymization techniques rely on data manipulation concepts such as aggregation (associating a higher category to some of the features in order to generalize them), subsampling from a larger population in order to achieve the final desired population size, and adding noise to the data set. However, the usefulness of the data set resulting from the above techniques is questionable, and they are not usually safeguarded against reidentification.…”
Section: Discussionmentioning
confidence: 99%
“…There may be an ideal amount of noise added for a single query, yet querying the data multiple times may result in insufficient privacy protections. Further, MDClone is not susceptible to a model inversion attack as described by Veale et al; 16 the synthetic data derivative does not contain a one-to-one ratio between the training set B and the members of B′, which is created from a model that is not based on a specific member of B.…”
Section: Discussionmentioning
confidence: 99%
“…In this regard the replication of medical studies with synthetic data by Yale et al substantiate the value of SD for exploratory data analysis, reproducibility on restricted data and more generally education in scientific training (Reiner Benaim et al, 2020). Reproducing medical or clinical studies will be necessary to gain mainstream adoption of GAN produced SD and dispel the scepticism it is generally met with.…”
Section: Benchmarking a Prioritymentioning
confidence: 97%