Temporal Cycle-Consistency Learning

Dwibedi, Debidatta; Aytar, Yusuf; Tompson, Jonathan; Sermanet, Pierre; Zisserman, Andrew

doi:10.1109/cvpr.2019.00190

Cited by 241 publications

(308 citation statements)

References 34 publications

Supporting

Mentioning

296

Contrasting

Order By: Relevance

“…Some papers related to cycleconsistency [50,8] introduce self-supervised methods for learning visual correspondence between images or videos from unlabeled videos. They use cycle-consistency as free supervision to learn video representations.…”

Section: Cycle-consistencymentioning

confidence: 99%

Temporal Attentive Alignment for Large-Scale Video Domain Adaptation

Chen

Kira

AlRegib

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

159

175

View full text Add to dashboard Cite

Although various image-based domain adaptation (DA) techniques have been proposed in recent years, domain shift in videos is still not well-explored. Most previous works only evaluate performance on small-scale datasets which are saturated. Therefore, we first propose two largescale video DA datasets with much larger domain discrepancy: UCF-HMDB f ull and Kinetics-Gameplay. Second, we investigate different DA integration methods for videos, and show that simultaneously aligning and learning temporal dynamics achieves effective alignment even without sophisticated DA methods. Finally, we propose Temporal Attentive Adversarial Adaptation Network (TA 3 N), which explicitly attends to the temporal dynamics using domain discrepancy for more effective domain alignment, achieving state-of-the-art performance on four video DA datasets (e.g. 7.9% accuracy gain over "Source only" from 73.9% to 81.8% on "HMDB → UCF", and 10.3% gain on "Kinetics → Gameplay"). The code and data are released at

show abstract

Section: Cycle-consistencymentioning

confidence: 99%

Temporal Attentive Alignment for Large-Scale Video Domain Adaptation

Chen

Kira

AlRegib

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

159

175

View full text Add to dashboard Cite

show abstract

“…Second, the property demands a bijective mapping between the elements of each sequence hence if the number of elements of the two sequences is not equal, they cannot be consistent. To address both of these problems, we adopt a differential loss function utilising a soft nearest neighbour approach as in [14] that is minimized by making two sequences more consistent with one another. For each embedding, p i ∈ P, its soft nearest neighbour q i ∈ Span(Q) is defined by Equation ( 5):…”

Section: Inter-view Self-supervisionmentioning

confidence: 99%

Echo-SyncNet: Self-Supervised Cardiac View Synchronization in Echocardiography

Dezaki

Luong

Ginsberg

et al. 2021

IEEE Trans. Med. Imaging

View full text Add to dashboard Cite

In echocardiography (echo), an electrocardiogram (ECG) is conventionally used to temporally align different cardiac views for assessing critical measurements. However, in emergencies or point-of-care situations, acquiring an ECG is often not an option, hence motivating the need for alternative temporal synchronization methods. Here, we propose Echo-SyncNet, a self-supervised learning framework to synchronize various cross-sectional 2D echo series without any human supervision or external inputs. The proposed framework takes advantage of two types of supervisory signals derived from the input data: spatiotemporal patterns found between the frames of a single cine (intra-view self-supervision) and interdependencies between multiple cines (inter-view self-supervision). The combined supervisory signals are used to learn a featurerich and low dimensional embedding space where multiple echo cines can be temporally synchronized. Two intra-view self-supervisions are used, the first is based on the information encoded by the temporal ordering of a cine (temporal intra-view) and the second on the spatial similarities between nearby frames (spatial intra-view). The inter-view self-supervision is used to promote the learning of similar embeddings for frames captured from the same cardiac phase in different echo views. We evaluate the framework with multiple experiments: 1) Using data from 998 patients, Echo-SyncNet shows promising results for synchronizing Manuscript received ***; accepted ***. Date of publication ***; date of current version ***.

show abstract

“…Wang et al [38] propose a backward and forward time route to locate the aim area. Dwibedi et al [10,35,43] encoded the video in an embedding space. Then, by learning the cycle consistency representation, the network can align the similar videos.…”

Section: Related Workmentioning

confidence: 99%