Effect of incorporating metadata to the generation of synthetic time series in a healthcare context

Isasa, I.; Hernandez, Mikel; Epelde, Gorka; Londoño, Francisco; Beristáin, Antonio; Alberdi, Ane; Bamidis, Panagiotis D.; Konstantinidis, Evdokimos I.

doi:10.1109/cbms58004.2023.00341

2023 IEEE 36th International Symposium on Computer-Based Medical Systems (CBMS) 2023

DOI: 10.1109/cbms58004.2023.00341

|View full text |Cite

Effect of incorporating metadata to the generation of synthetic time series in a healthcare context

I. Isasa

Mikel Hernandez

Gorka Epelde

et al.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2024

Publication Types

Select...

Article2

Relationship

Self Cite1

Independent1

Authors

Journals

Cited by 2 publications

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

Synthetic data generation methods in healthcare: A review on open-source tools and methods

Pezoulas,

Zaridis,

Mylona

et al. 2024

Computational and Structural Biotechnology Journal

View full text Add to dashboard Cite

Synthetic data generation methods in healthcare: A review on open-source tools and methods

Pezoulas,

Zaridis,

Mylona

et al. 2024

Computational and Structural Biotechnology Journal

View full text Add to dashboard Cite

Comparative assessment of synthetic time series generation approaches in healthcare: leveraging patient metadata for accurate data synthesis

Isasa,

Hernandez,

Epelde

et al. 2024

BMC Med Inform Decis Mak

Self Cite

View full text Add to dashboard Cite

Background Synthetic data is an emerging approach for addressing legal and regulatory concerns in biomedical research that deals with personal and clinical data, whether as a single tool or through its combination with other privacy enhancing technologies. Generating uncompromised synthetic data could significantly benefit external researchers performing secondary analyses by providing unlimited access to information while fulfilling pertinent regulations. However, the original data to be synthesized (e.g., data acquired in Living Labs) may consist of subjects’ metadata (static) and a longitudinal component (set of time-dependent measurements), making it challenging to produce coherent synthetic counterparts. Methods Three synthetic time series generation approaches were defined and compared in this work: only generating the metadata and coupling it with the real time series from the original data (A1), generating both metadata and time series separately to join them afterwards (A2), and jointly generating both metadata and time series (A3). The comparative assessment of the three approaches was carried out using two different synthetic data generation models: the Wasserstein GAN with Gradient Penalty (WGAN-GP) and the DöppelGANger (DGAN). The experiments were performed with three different healthcare-related longitudinal datasets: Treadmill Maximal Effort Test (TMET) measurements from the University of Malaga (1), a hypotension subset derived from the MIMIC-III v1.4 database (2), and a lifelogging dataset named PMData (3). Results Three pivotal dimensions were assessed on the generated synthetic data: resemblance to the original data (1), utility (2), and privacy level (3). The optimal approach fluctuates based on the assessed dimension and metric. Conclusion The initial characteristics of the datasets to be synthesized play a crucial role in determining the best approach. Coupling synthetic metadata with real time series (A1), as well as jointly generating synthetic time series and metadata (A3), are both competitive methods, while separately generating time series and metadata (A2) appears to perform more poorly overall.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Effect of incorporating metadata to the generation of synthetic time series in a healthcare context

Cited by 2 publications

References 18 publications

Synthetic data generation methods in healthcare: A review on open-source tools and methods

Synthetic data generation methods in healthcare: A review on open-source tools and methods

Comparative assessment of synthetic time series generation approaches in healthcare: leveraging patient metadata for accurate data synthesis

Contact Info

Product

Resources

About