Kazi Nazmul Haque scite author profile

The Generation power of Generative Adversarial Neural Networks (GANs) has shown great promise to learn representations from unlabelled data while guided by a small amount of labelled data. We aim to utilise the generation power of GANs to learn Audio Representations. Most existing studies are, however, focused on images. Some studies use GANs for speech generation, but they are conditioned on text or acoustic features, limiting their use for other audio, such as instruments, and even for speech where transcripts are limited. This paper proposes a novel GAN-based model that we named Guided Generative Adversarial Neural Network (GGAN), which can learn powerful representations and generate good-quality samples using a small amount of labelled data as guidance. Experimental results based on a speech [Speech Command Dataset (S09)] and a non-speech [Musical Instrument Sound dataset (Nsyth)] dataset demonstrate that using only 5% of labelled data as guidance, GGAN learns significantly better representations than the state-of-the-art models.

show abstract

High-Fidelity Audio Generation and Representation Learning With Guided Adversarial Autoencoder

Haque

Rana

Schuller

2020

IEEE Access

View full text Add to dashboard Cite

Generating high-fidelity conditional audio samples and learning representation from unlabelled audio data are two challenging problems in machine learning research. Recent advances in the Generative Adversarial Neural Networks (GAN) architectures show great promise in addressing these challenges. To learn powerful representation using GAN architecture, it requires superior sample generation quality, which requires an enormous amount of labelled data. In this paper, we address this issue by proposing Guided Adversarial Autoencoder (GAAE), which can generate superior conditional audio samples from unlabelled audio data using a small percentage of labelled data as guidance. Representation learned from unlabelled data without any supervision does not guarantee its' usability for any downstream task. On the other hand, during the representation learning, if the model is highly biased towards the downstream task, it losses its generalisation capability. This makes the learned representation hardly useful for any other tasks that are not related to that downstream task. The proposed GAAE model also address these issues. Using this superior conditional generation, GAAE can learn representation specific to the downstream task. Furthermore, GAAE learns another type of representation capturing the general attributes of the data, which is independent of the downstream task at hand. Experimental results involving the S09 and the NSynth dataset attest the superior performance of GAAE compared to the state-of-the-art alternatives.

show abstract

Image denoising and restoration with CNN-LSTM Encoder Decoder with Direct Attention

Haque¹,

Yousuf²,

Rana³

2018

Preprint

View full text Add to dashboard Cite

Image denoising is always a challenging task in the field of computer vision and image processing. In this paper we have proposed an encoder-decoder model with direct attention, which is capable of denoising and reconstruct highly corrupted images. Our model is consisted of an encoder and a decoder, where encoder is a convolutional neural network and decoder is a multilayer Long Short-Term memory network. In the proposed model, the encoder reads an image and catches the abstraction of that image in a vector, where decoder takes that vector as well as the corrupted image to reconstruct a clean image. We have trained our model on MNIST handwritten digit database after making lower half of every image as black as well as adding noise top of that. After a massive destruction of the images where it is hard for a human to understand the content of those images, our model can retrieve that image with minimal error. Our proposed model has been compared with convolutional encoder-decoder, where our model has performed better at generating missing part of the images than convolutional auto encoder.

show abstract

High-Fidelity Audio Generation and Representation Learning with Guided Adversarial Autoencoder

Haque¹,

Rana²,

Schuller³

2020

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.