Chandra Dhir scite author profile

Chandra Dhir

3Publications

61Citation Statements Received

48Citation Statements Given

How they've been cited

How they cite others

Affiliations

Apple (United Kingdom), Korea Advanced Institute of Science and Technology, Korea Institute of Brain Science

Publications

Order By: Most citations

Hybrid Transformer/CTC Networks for Hardware Efficient Voice Triggering

Adya¹,

Garg²,

Sigtia³

et al. 2020

View full text Add to dashboard Cite

We consider the design of two-pass voice trigger detection systems. We focus on the networks in the second pass that are used to re-score candidate segments obtained from the firstpass. Our baseline is an acoustic model(AM), with BiLSTM layers, trained by minimizing the CTC loss. We replace the BiLSTM layers with self-attention layers. Results on internal evaluation sets show that self-attention networks yield better accuracy while requiring fewer parameters. We add an autoregressive decoder network on top of the self-attention layers and jointly minimize the CTC loss on the encoder and the crossentropy loss on the decoder. This design yields further improvements over the baseline. We retrain all the models above in a multi-task learning(MTL) setting, where one branch of a shared network is trained as an AM, while the second branch classifies the whole sequence to be true-trigger or not. Results demonstrate that networks with self-attention layers yield ∼60% relative reduction in false reject rates for a given false-alarm rate, while requiring 10% fewer parameters. When trained in the MTL setup, self-attention networks yield further accuracy improvements. On-device measurements show that we observe 70% relative reduction in inference time. Additionally, the proposed network architectures are ∼5X faster to train.

show abstract

Unsupervised Style and Content Separation by Minimizing Mutual Information for Speech Synthesis

Shrivastava

Tuzel

et al. 2020

View full text Add to dashboard Cite

We present a method to generate speech from input text and a style vector that is extracted from a reference speech signal in an unsupervised manner, i.e., no style annotation, such as speaker information, is required. Existing unsupervised methods, during training, generate speech by computing style from the corresponding ground truth sample and use a decoder to combine the style vector with the input text. Training the model in such a way leaks content information into the style vector. The decoder can use the leaked content and ignore some of the input text to minimize the reconstruction loss. At inference time, when the reference speech does not match the content input, the output may not contain all of the content of the input text. We refer to this problem as "content leakage", which we address by explicitly estimating and minimizing the mutual information between the style and the content through an adversarial training formulation. We call our method MIST -Mutual Information based Style Content Separation. The main goal of the method is to preserve the input content in the synthesized speech signal, which we measure by the word error rate (WER) and show substantial improvements over state-of-the-art unsupervised speech synthesis methods.

show abstract

Discriminant Independent Component Analysis

Dhir

Lee

2011

IEEE Trans. Neural Netw.

View full text Add to dashboard Cite

A conventional linear model based on Negentropy maximization extracts statistically independent latent variables which may not be optimal to give a discriminant model with good classification performance. In this paper, a single-stage linear semisupervised extraction of discriminative independent features is proposed. Discriminant independent component analysis (dICA) presents a framework of linearly projecting multivariate data to a lower dimension where the features are maximally discriminant with minimal redundancy. The optimization problem is formulated as the maximization of linear summation of Negentropy and weighted functional measure of classification. Motivated by independence among extracted features, Fisher linear discriminant is used as the functional measure of classification. Experimental results show improved classification performance when dICA features are used for recognition tasks in comparison to unsupervised (principal component analysis and ICA) and supervised feature extraction techniques like linear discriminant analysis (LDA), conditional ICA, and those based on information theoretic learning approaches. dICA features also give reduced data reconstruction error in comparison to LDA and ICA method based on Negentropy maximization.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Chandra Dhir

Hybrid Transformer/CTC Networks for Hardware Efficient Voice Triggering

Unsupervised Style and Content Separation by Minimizing Mutual Information for Speech Synthesis

Discriminant Independent Component Analysis

Contact Info

Product

Resources

About