Objective: Cardiovascular diseases are a major cause of mortality globally, and electrocardiograms (ECGs) are crucial for diagnosing them. Traditionally, ECGs are stored in printed formats. However, these printouts, even when scanned, are incompatible with advanced ECG diagnosis software that require time-series data. Digitizing ECG images is vital for training machine learning models in ECG diagnosis, leveraging the extensive global archives collected over decades. Deep learning models for image processing are promising in this regard, although the lack of clinical ECG archives with reference time-series data is challenging. Data augmentation techniques using realistic generative data models provide a solution.

Approach: We introduce ECG-Image-Kit, an open-source toolbox for generating synthetic multi-lead ECG images with realistic artifacts from time-series data, aimed at automating the conversion of scanned ECG images to ECG data points. The tool synthesizes ECG images from real time-series data, applying distortions like text artifacts, wrinkles, and creases on a standard ECG paper background.

Main results: As a case study, we used ECG-Image-Kit to create a dataset of 21,801 ECG images from the PhysioNet QT database. We developed and trained a combination of a traditional computer vision and deep neural network model on this dataset to convert synthetic images into time-series data for evaluation. We assessed digitization quality by calculating the signal-to-noise ratio (SNR) and compared clinical parameters like QRS width, RR, and QT intervals recovered from this pipeline, with the ground truth extracted from ECG time-series. The results show that this deep learning pipeline accurately digitizes paper ECGs, maintaining clinical parameters, and highlights a generative approach to digitization.

Significance: The toolbox has broad applications, including model development for ECG image digitization and classification. The toolbox currently supports data augmentation for the 2024 PhysioNet Challenge, focusing on digitizing and classifying paper ECG images.