Lu Han scite author profile

In this paper we present an end-to-end speech recognition model with Transformer encoders that can be used in a streaming speech recognition system. Transformer computation blocks based on selfattention are used to encode both audio and label sequences independently. The activations from both audio and label encoders are combined with a feed-forward layer to compute a probability distribution over the label space for every combination of acoustic frame position and label history. This is similar to the Recurrent Neural Network Transducer (RNN-T) model, which uses RNNs for information encoding instead of Transformer encoders. The model is trained with the RNN-T loss well-suited to streaming decoding. We present results on the LibriSpeech dataset showing that limiting the left context for self-attention in the Transformer layers makes decoding computationally tractable for streaming, with only a slight degradation in accuracy. We also show that the full attention version of our model beats the-state-of-the art accuracy on the LibriSpeech benchmarks. Our results also show that we can bridge the gap between full attention and limited attention versions of our model by attending to a limited number of future frames.

show abstract

Detailed 2D-3D Joint Representation for Human-Object Interaction

Liu

Han

et al. 2020

116

View full text Add to dashboard Cite

Context-aware Battery Management for Mobile Phones

et al. 2008

View full text Add to dashboard Cite

show abstract

Detecting work-related stress with a wearable device

Han

Zhang

Chen

et al. 2017

Computers in Industry

View full text Add to dashboard Cite

Monotonic Recurrent Neural Network Transducer and Decoding Strategies

Tripathi

Han

Sak

et al. 2019

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Lu Han

Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss

Detailed 2D-3D Joint Representation for Human-Object Interaction

Context-aware Battery Management for Mobile Phones

Detecting work-related stress with a wearable device

Monotonic Recurrent Neural Network Transducer and Decoding Strategies

Contact Info

Product

Resources

About