Collaborative Learning to Generate Audio-Video Jointly

Kurmi, Vinod K; Bajaj, Vipul; Patro, Badri N.; Venkatesh, K. S.; Namboodiri, Vinay P.; Jyothi, Preethi

doi:10.1109/icassp39728.2021.9413802

Cited by 6 publications

(4 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…On the one hand, the possibility of artificially generating several data based on the few existing data using a Generative Adversarial Network in order to subsequently train the deep learning models is discussed [45], [46]. On the other hand, transfer learning is presented, which represents the possibility of "further training" another model based on an already trained deep learning model with just a little data and using the finished model for this purpose.…”

Section: B Gesture Recognition With Deep Learningmentioning

confidence: 99%

A Novel Hybrid Deep Learning Architecture for Dynamic Hand Gesture Recognition

Hax,

Penava,

Krodel

et al. 2024

IEEE Access

View full text Add to dashboard Cite

Hand gestures are a form of natural communication used in human-computer interaction, however, when gestures are video-based, extraction of features for classification is complex. Current machine learning models struggle to achieve high accuracies when using videos recorded in realistic environments. In this work, we propose a hybrid architecture consisting of a recurrent neural network (RNN), including a long short-term memory layer, on top of a convolutional neural network, to recognize dynamic hand gestures recorded in realistic environments. We used a dataset of 6 dynamic hand gestures: scroll-left, scroll-right, scroll-up, scroll-down, zoom-in, and zoom-out. Our implemented inception-v3 model extracted features and provided the wrapped frame-feature map as input for the RNN, which performs the final classification. The proposed model classifies gestures with an average accuracy of 83.66%. By doing so, we intend to narrow the disparity between realistic environments and high accuracy. Finally, we compare the accuracy of our proposed dynamic hand gesture recognition model with that of the benchmark.

show abstract

Section: B Gesture Recognition With Deep Learningmentioning

confidence: 99%

A Novel Hybrid Deep Learning Architecture for Dynamic Hand Gesture Recognition

Hax,

Penava,

Krodel

et al. 2024

IEEE Access

View full text Add to dashboard Cite

show abstract

“…Further assessment revealed that the model could learn features to reckon actions at minimum supervision-scene dynamics are viable for representation learning. Several works were proposed for same purpose using GANs [404,406,407] The Motion and Content decomposed GAN (MoCoGAN) was introduced by Tulyakov et al [408] The translation of input to output images can be performed using CGAN-a recurring theme in computer vision, computer graphics, and image processing. This pix2pix model resolves these image-related issues [415][416][417].…”

Section: Video Prediction and Generationmentioning

confidence: 99%

“…The model efficacy was verified empirically via quantitative and qualitative approaches. This approach has been improved in different ways[360,404,407].5. Anime character generationApart from requiring experts for routine tasks, animation production and game development are costly.…”

mentioning

confidence: 99%

A survey on deep learning tools dealing with data scarcity: definitions, challenges, solutions, tips, and applications

et al. 2023

View full text Add to dashboard Cite

Data scarcity is a major challenge when training deep learning (DL) models. DL demands a large amount of data to achieve exceptional performance. Unfortunately, many applications have small or inadequate data to train DL frameworks. Usually, manual labeling is needed to provide labeled data, which typically involves human annotators with a vast background of knowledge. This annotation process is costly, time-consuming, and error-prone. Usually, every DL framework is fed by a significant amount of labeled data to automatically learn representations. Ultimately, a larger amount of data would generate a better DL model and its performance is also application dependent. This issue is the main barrier for many applications dismissing the use of DL. Having sufficient data is the first step toward any successful and trustworthy DL application. This paper presents a holistic survey on state-of-the-art techniques to deal with training DL models to overcome three challenges including small, imbalanced datasets, and lack of generalization. This survey starts by listing the learning techniques. Next, the types of DL architectures are introduced. After that, state-of-the-art solutions to address the issue of lack of training data are listed, such as Transfer Learning (TL), Self-Supervised Learning (SSL), Generative Adversarial Networks (GANs), Model Architecture (MA), Physics-Informed Neural Network (PINN), and Deep Synthetic Minority Oversampling Technique (DeepSMOTE). Then, these solutions were followed by some related tips about data acquisition needed prior to training purposes, as well as recommendations for ensuring the trustworthiness of the training dataset. The survey ends with a list of applications that suffer from data scarcity, several alternatives are proposed in order to generate more data in each application including Electromagnetic Imaging (EMI), Civil Structural Health Monitoring, Medical imaging, Meteorology, Wireless Communications, Fluid Mechanics, Microelectromechanical system, and Cybersecurity. To the best of the authors’ knowledge, this is the first review that offers a comprehensive overview on strategies to tackle data scarcity in DL.

show abstract

“…Several studies showcase the success of adversarial learning framework in a variety of applications such as image generation [11], [15], audio-generation [20], domain adaptation [9], [23], [25], image in-painting [32], [53], incremental learning [24] and fairness leaning. All of these approaches optimize the network with an adversarial discriminator.…”

Section: Adversarial Learningmentioning

confidence: 99%

Sensor-invariant Fingerprint ROI Segmentation Using Recurrent Adversarial Learning

Joshi¹,

Utkarsh²,

Kothari³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

A fingerprint region of interest (roi) segmentation algorithm is designed to separate the foreground fingerprint from the background noise. All the learning based state-ofthe-art fingerprint roi segmentation algorithms proposed in the literature are benchmarked on scenarios when both training and testing databases consist of fingerprint images acquired from the same sensors. However, when testing is conducted on a different sensor, the segmentation performance obtained is often unsatisfactory. As a result, every time a new fingerprint sensor is used for testing, the fingerprint roi segmentation model needs to be re-trained with the fingerprint image acquired from the new sensor and its corresponding manually marked ROI. Manually marking fingerprint ROI is expensive because firstly, it is time consuming and more importantly, requires domain expertise. In order to save the human effort in generating annotations required by state-of-the-art, we propose a fingerprint roi segmentation model which aligns the features of fingerprint images derived from the unseen sensor such that they are similar to the ones obtained from the fingerprints whose ground truth roi masks are available for training. Specifically, we propose a recurrent adversarial learning based feature alignment network that helps the fingerprint roi segmentation model to learn sensor-invariant features. Consequently, sensor-invariant features learnt by the proposed roi segmentation model help it to achieve improved segmentation performance on fingerprints acquired from the new sensor. Experiments on publicly available FVC databases demonstrate the efficacy of the proposed work.

show abstract

Collaborative Learning to Generate Audio-Video Jointly

Cited by 6 publications

References 21 publications

A Novel Hybrid Deep Learning Architecture for Dynamic Hand Gesture Recognition

A Novel Hybrid Deep Learning Architecture for Dynamic Hand Gesture Recognition

A survey on deep learning tools dealing with data scarcity: definitions, challenges, solutions, tips, and applications

Sensor-invariant Fingerprint ROI Segmentation Using Recurrent Adversarial Learning

Contact Info

Product

Resources

About