In echocardiography (echo), an electrocardiogram (ECG) is conventionally used to temporally align different cardiac views for assessing critical measurements. However, in emergencies or point-of-care situations, acquiring an ECG is often not an option, hence motivating the need for alternative temporal synchronization methods. Here, we propose Echo-SyncNet, a self-supervised learning framework to synchronize various cross-sectional 2D echo series without any human supervision or external inputs. The proposed framework takes advantage of two types of supervisory signals derived from the input data: spatiotemporal patterns found between the frames of a single cine (intra-view self-supervision) and interdependencies between multiple cines (inter-view self-supervision). The combined supervisory signals are used to learn a featurerich and low dimensional embedding space where multiple echo cines can be temporally synchronized. Two intra-view self-supervisions are used, the first is based on the information encoded by the temporal ordering of a cine (temporal intra-view) and the second on the spatial similarities between nearby frames (spatial intra-view). The inter-view self-supervision is used to promote the learning of similar embeddings for frames captured from the same cardiac phase in different echo views. We evaluate the framework with multiple experiments: 1) Using data from 998 patients, Echo-SyncNet shows promising results for synchronizing Manuscript received ***; accepted ***. Date of publication ***; date of current version ***.
In echocardiography (echo), an electrocardiogram (ECG) is conventionally used to temporally align different cardiac views for assessing critical measurements. However, in emergencies or point-of-care situations, acquiring an ECG is often not an option, hence motivating the need for alternative temporal synchronization methods. Here, we propose Echo-SyncNet, a self-supervised learning framework to synchronize various cross-sectional 2D echo series without any human supervision or external inputs. The proposed framework takes advantage of two types of supervisory signals derived from the input data: spatiotemporal patterns found between the frames of a single cine (intra-view self-supervision) and interdependencies between multiple cines (inter-view self-supervision). The combined supervisory signals are used to learn a featurerich and low dimensional embedding space where multiple echo cines can be temporally synchronized. Two intra-view self-supervisions are used, the first is based on the information encoded by the temporal ordering of a cine (temporal intra-view) and the second on the spatial similarities between nearby frames (spatial intra-view). The inter-view self-supervision is used to promote the learning of similar embeddings for frames captured from the same cardiac phase in different echo views. We evaluate the framework with multiple experiments: 1) Using data from 998 patients, Echo-SyncNet shows promising results for synchronizing Apical 2 chamber and Apical 4 chamber cardiac views, which are acquired spatially perpendicular to each other; 2) Using data from 3070 patients, our experiments reveal that the learned representations of Echo-SyncNet outperform
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.