Sustained Self-Supervised Pretraining for Temporal Order Verification

Buckchash, Himanshu; Raman, Balasubramanian

doi:10.1007/978-3-030-34869-4_16

Cited by 8 publications

(8 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…During the initial days of self-supervised learning, a lot of research was done on handcrafting pre-training tasks, also known as pretext tasks. These handcrafted tasks include geometric transformation prediction [1,2,3], context prediction [4,5], jigsaw puzzle solving [6,7,8,9], temporal order related tasks for videos [10,11,12,13,14], pace prediction in videos [15], image colorization [16], etc. These pretext tasks are aimed at learning representations that are invariant to transformations, context, etc.…”

Section: Literature Surveymentioning

confidence: 99%

MIO : Mutual Information Optimization using Self-Supervised Binary Contrastive Learning

Manna¹,

Bhattacharya²,

Pal³

2021

Preprint

View full text Add to dashboard Cite

Self-supervised contrastive learning is one of the domains which has progressed rapidly over the last few years. Most of the state-of-the-art self-supervised algorithms use a large number of negative samples, momentum updates, specific architectural modifications, or extensive training to learn good representations. Such arrangements make the overall training process complex and challenging to realize analytically. In this paper, we propose a mutual information optimization based loss function for contrastive learning where we model contrastive learning into a binary classification problem to predict if a pair is positive or not. This transition of the problem not only helps us to track the problem mathematically but also helps us to outperform existing algorithms. Unlike the existing methods that only maximize the mutual information in a positive pair, the proposed loss function optimizes the mutual information in both positive and negative pairs. We also present a mathematical expression for the parameter gradients flowing into the projector and the position of the feature vectors in the feature space, which helps us get a mathematical insight into the working principle of contrastive learning. We also use an additional L 2 regularizer to prevent diverging of the feature vectors and improve performance. The proposed method outperforms the state-of-the-art algorithms on benchmark datasets like STL-10, CIFAR-10, CIFAR-100. After only 250 epochs of pre-training, the proposed model achieves the best accuracy of 85.

show abstract

Section: Literature Surveymentioning

confidence: 99%

MIO : Mutual Information Optimization using Self-Supervised Binary Contrastive Learning

Manna¹,

Bhattacharya²,

Pal³

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Conditioning on generated samples is called auto-conditioning. Joint probability of the distribution Y can be expressed as the product of ordered conditionals, refer (5). Auto-conditioning improves on maximum-likelihood estimation (MLE) based training of RNN by allowing generated distribution Y to drift away from the ground-truth distribution X .…”

Section: Proposed Modelmentioning

confidence: 99%

“…It is a core problem since it facilitates the understanding of computational aspects of human-machine interactions, cybernetic systems, and has multiple applications ranging from human behavior analysis, security systems, augmented and virtual reality. It has drawn significant attention from the action recognition community [1]- [5]. However, due to lingering challenges of action prediction in visual domain [6], the trend is shifting from simple action classification [7]- [9] to motion prediction in disentangled domain [10]- [16].…”

Section: Introductionmentioning

confidence: 99%

Variational Conditioning of Deep Recurrent Networks for Modeling Complex Motion Dynamics

Buckchash

Raman

2020

IEEE Access

Self Cite

View full text Add to dashboard Cite

This work introduces stochastic models to address the problem of complex motion generation. Long-term motion generation is the primary task in several fields; however, less work has happened in this direction and is restricted to single-person activities. Looking forward, this work strives to solve the problem of two-person 3D motion generation. Single-person motion generation is comparatively simpler than twoperson, where the complexity is significantly higher. Error-propagation during motion generation is a key challenge. Current approaches often fail to keep the predicted skeletons on the manifold of valid poses. Another challenge is when the model restricts its learning to the training distribution. This leads to stagnation in generated moves. To this end, an end-to-end hierarchical auto-regressive model is proposed for efficient long-term motion generation. It is further constrained by an alignment network to reduce prediction errors. Proposed approach shows comparable results on prediction tasks, while outperforming the state-of-the-art on long-term motion generation.INDEX TERMS Recurrent neural networks, 3D motion prediction, pose embedding.

show abstract

“…The self-supervised learning models learn meaningful features by means of solving various tasks called Pretext tasks. Several types of pretext tasks such as image inpainting [14], temporal order correction or verification [15][16][17][18][19], clip order prediction [20], image coloring [21], geometric transformation prediction [22,23], relative patch prediction based on context-aware features [24], etc. have been proposed.…”

Section: Introductionmentioning

confidence: 99%

SSLM: Self-Supervised Learning for Medical Diagnosis from MR Video

Manna¹,

Bhattacharya²,

Pal³

2021

Preprint

View full text Add to dashboard Cite

In medical image analysis, the cost of acquiring high-quality data and their annotation by experts is a barrier in many medical applications. Most of the techniques used are based on supervised learning framework and need a large amount of annotated data to achieve satisfactory performance. As an alternative, in this paper, we propose a self-supervised learning approach to learn the spatial anatomical representations from the frames of magnetic resonance (MR) video clips for the diagnosis of knee medical conditions. The pretext model learns meaningful spatial context-invariant representations. The downstream task in our paper is a class imbalanced multi-label classification. Different experiments show that the features learnt by the pretext model provide explainable performance in the downstream task. Moreover, the efficiency and reliability of the proposed pretext model in learning representations of minority classes without applying any strategy towards imbalance in the dataset can be seen from the results. To the best of our knowledge, this work is the first work of its kind in showing the effectiveness and reliability of self-supervised learning algorithms in class imbalanced multi-label classification tasks on MR video.

show abstract

Sustained Self-Supervised Pretraining for Temporal Order Verification

Cited by 8 publications

References 17 publications

MIO : Mutual Information Optimization using Self-Supervised Binary Contrastive Learning

MIO : Mutual Information Optimization using Self-Supervised Binary Contrastive Learning

Variational Conditioning of Deep Recurrent Networks for Modeling Complex Motion Dynamics

SSLM: Self-Supervised Learning for Medical Diagnosis from MR Video

Contact Info

Product

Resources

About