Background:: Synthetic Electronic Health Records (EHRs) are becoming increasingly popular as a privacy enhancing technology. However, for longitudinal EHRs specifically, little research has been done into how to properly evaluate synthetically generated samples. In this article, we provide a discussion on existing methods and recommendations when evaluating the quality of synthetic longitudinal EHRs.
Methods:: We recommend to assess synthetic EHR quality through similarity to real EHRs in low-dimensional projections, accuracy of a classifier discriminating synthetic from real samples, performance of synthetic versus real trained algorithms in clinical tasks, and privacy risk through risk of attribute inference. For each metric we discuss strengths and weaknesses, next to showing how it can be applied on a longitudinal dataset.
Results:: To support the discussion on evaluation metrics, we apply discussed metrics on a dataset of synthetic EHRs generated from the Medical Information Mart for Intensive Care-IV (MIMIC-IV) repository.
Conclusions:: The discussion on evaluation metrics provide guidance for researchers on how to use and interpret different metrics when evaluating the quality of synthetic longitudinal EHRs.