Zhiyun Lu scite author profile

Zhiyun Lu

5Publications

104Citation Statements Received

126Citation Statements Given

How they've been cited

142

104

How they cite others

125

Affiliations

Google (United States), University of Southern California, Sichuan University

Publications

Order By: Most citations

Learning compact recurrent neural networks

Sindhwani

Sainath

2016

View full text Add to dashboard Cite

Recurrent neural networks (RNNs), including long short-term memory (LSTM) RNNs, have produced state-of-the-art results on a variety of speech recognition tasks. However, these models are often too large in size for deployment on mobile devices with memory and latency constraints. In this work, we study mechanisms for learning compact RNNs and LSTMs via low-rank factorizations and parameter sharing schemes. Our goal is to investigate redundancies in recurrent architectures where compression can be admitted without losing performance. A hybrid strategy of using structured matrices in the bottom layers and shared low-rank factors on the top layers is found to be particularly effective, reducing the parameters of a standard LSTM by 75%, at a small cost of 0.3% increase in WER, on a 2,000-hr English Voice Search task.

show abstract

Speech Sentiment Analysis via Pre-Trained Features from End-to-End ASR Models

Cao

Zhang

et al. 2020

View full text Add to dashboard Cite

In this paper, we propose to use pre-trained features from end-to-end ASR models to solve the speech sentiment analysis problem as a down-stream task. We show that end-toend ASR features, which integrate both acoustic and text information from speech, achieve promising results. We use RNN with self-attention as the sentiment classifier, which also provides an easy visualization through attention weights to help interpret model predictions. We use well benchmarked IEMOCAP dataset and a new large-scale sentiment analysis dataset SWBD-senti for evaluation. Our approach improves the-state-of-the-art accuracy on IEMOCAP from 66.6% to 71.7%, and achieves an accuracy of 70.10% on SWBD-senti with more than 49,500 utterances.Index Terms-Speech sentiment analysis, End-to-end ASR pre-traininig,

show abstract

A comparison between deep neural nets and kernel acoustic models for speech recognition

Lu¹,

Quo

Garakani³

et al. 2016

View full text Add to dashboard Cite

We study large-scale kernel methods for acoustic modeling and compare to DNNs on performance metrics related to both acoustic modeling and recognition. Measuring perplexity and frame-level classification accuracy, kernel-based acoustic models are as effective as their DNN counterparts. However, on token-error-rates DNN models can be significantly better. We have discovered that this might be attributed to DNN's unique strength in reducing both the perplexity and the entropy of the predicted posterior probabilities. Motivated by our findings, we propose a new technique, entropy regularized perplexity, for model selection. This technique can noticeably improve the recognition performance of both types of models, and reduces the gap between them. While effective on Broadcast News, this technique could be also applicable to other tasks.

show abstract

Selecting β-Divergence for Nonnegative Matrix Factorization by Score Matching

Yang

Oja

2012

View full text Add to dashboard Cite

Exploring Targeted Universal Adversarial Perturbations to End-to-end ASR Models

Lu¹,

Han²,

Cao³

2021

Preprint

View full text Add to dashboard Cite

Although end-to-end automatic speech recognition (e2e ASR) models are widely deployed in many applications, there have been very few studies to understand models' robustness against adversarial perturbations. In this paper, we explore whether a targeted universal perturbation vector exists for e2e ASR models. Our goal is to find perturbations that can mislead the models to predict the given targeted transcript such as "thank you" or empty string on any input utterance. We study two different attacks, namely additive and prepending perturbations, and their performances on the state-of-the-art LAS, CTC and RNN-T models. We find that LAS is the most vulnerable to perturbations among the three models. RNN-T is more robust against additive perturbations, especially on long utterances. And CTC is robust against both additive and prepending perturbations. To attack RNN-T, we find prepending perturbation is more effective than the additive perturbation, and can mislead the models to predict the same short target on utterances of arbitrary length.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Zhiyun Lu

Learning compact recurrent neural networks

Speech Sentiment Analysis via Pre-Trained Features from End-to-End ASR Models

A comparison between deep neural nets and kernel acoustic models for speech recognition

Selecting β-Divergence for Nonnegative Matrix Factorization by Score Matching

Exploring Targeted Universal Adversarial Perturbations to End-to-end ASR Models

Contact Info

Product

Resources

About