Cem Subakan scite author profile

Recurrent Neural Networks (RNNs) have long been the dominant architecture in sequence-to-sequence learning. RNNs, however, are inherently sequential models that do not allow parallelization of their computations. Transformers are emerging as a natural alternative to standard RNNs, replacing recurrent computations with a multi-head attention mechanism.In this paper, we propose the SepFormer, a novel RNN-free Transformer-based neural network for speech separation. The Sep-Former learns short and long-term dependencies with a multi-scale approach that employs transformers. The proposed model achieves state-of-the-art (SOTA) performance on the standard WSJ0-2/3mix datasets. It reaches an SI-SNRi of 22.3 dB on WSJ0-2mix and an SI-SNRi of 19.5 dB on WSJ0-3mix. The SepFormer inherits the parallelization advantages of Transformers and achieves a competitive performance even when downsampling the encoded representation by a factor of 8. It is thus significantly faster and it is less memory-demanding than the latest speech separation systems with comparable performance.

show abstract

Two-Step Sound Source Separation: Training On Learned Latent Targets

Tzinis

Venkataramani

Wang

et al. 2020

View full text Add to dashboard Cite

In this paper, we propose a two-step training procedure for source separation via a deep neural network. In the first step we learn a transform (and it's inverse) to a latent space where masking-based separation performance using oracles is optimal. For the second step, we train a separation module that operates on the previously learned space. In order to do so, we also make use of a scale-invariant signal to distortion ratio (SI-SDR) loss function that works in the latent space, and we prove that it lower-bounds the SI-SDR in the time domain. We run various sound separation experiments that show how this approach can obtain better performance as compared to systems that learn the transform and the separation module jointly. The proposed methodology is general enough to be applicable to a large class of neural network end-to-end separation systems.

show abstract

Attention is All You Need in Speech Separation

Subakan

Ravanelli

Cornell

et al. 2020

Preprint

View full text Add to dashboard Cite

Neural network alternatives toconvolutive audio models for source separation

Venkataramani

Subakan

Smaragdis

2017

View full text Add to dashboard Cite

Convolutive Non-Negative Matrix Factorization model factorizes a given audio spectrogram using frequency templates with a temporal dimension. In this paper, we present a convolutional auto-encoder model that acts as a neural network alternative to convolutive NMF. Using the modeling flexibility granted by neural networks, we also explore the idea of using a Recurrent Neural Network in the encoder. Experimental results on speech mixtures from TIMIT dataset indicate that the convolutive architecture provides a significant improvement in separation performance in terms of BSSeval metrics.

show abstract

A generative modeling approach for interpreting population-level variability in brain structure

Liu

Subakan

Balwani

et al. 2020

Preprint

View full text Add to dashboard Cite

Understanding how neural structure varies across individuals is critical for characterizing the effects of disease, learning, and aging on the brain. However, disentangling the different factors that give rise to individual variability is still an outstanding challenge. In this paper, we introduce a deep generative modeling approach to find different modes of variation across many individuals. To do this, we start by training a variational autoencoder on a collection of auto-fluorescence images from a little over 1,700 mouse brains at 25 micron resolution. To then tap into the learned factors and validate the model's expressiveness, we developed a novel bi-directional technique to interpret the latent space-by making structured perturbations to both, the high-dimensional inputs of the network, as well as the low-dimensional latent variables in its bottleneck. Our results demonstrate that through coupling generative modeling frameworks with structured perturbations, it is possible to probe the latent space to provide insights into the representations of brain structure formed in deep neural networks.Keywords: variational autoencoder · interpretable deep learning · brain architecture and neuroanatomy.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Cem Subakan

Attention Is All You Need In Speech Separation

Two-Step Sound Source Separation: Training On Learned Latent Targets

Attention is All You Need in Speech Separation

Neural network alternatives toconvolutive audio models for source separation

A generative modeling approach for interpreting population-level variability in brain structure

Contact Info

Product

Resources

About