ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9413854
|View full text |Cite
|
Sign up to set email alerts
|

Tiny Transducer: A Highly-Efficient Speech Recognition Model on Edge Devices

Abstract: This paper proposes an extremely lightweight phonebased transducer model with a tiny decoding graph on edge devices. First, a phone synchronous decoding (PSD) algorithm based on blank label skipping is first used to speed up the transducer decoding process. Then, to decrease the deletion errors introduced by the high blank score, a blank label deweighting approach is proposed. To reduce parameters and computation, deep feedforward sequential memory network (DFSMN) layers are used in the transducer encoder, and… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
7
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 16 publications
(7 citation statements)
references
References 23 publications
0
7
0
Order By: Relevance
“…This architecture uses an extra token called a blank token (). During model testing, the probability of a blank token is usually quite large [13]. This led to an increase in deletion error where output missed many tokens.…”
Section: Blank Label Re-weightingmentioning
confidence: 99%
See 1 more Smart Citation
“…This architecture uses an extra token called a blank token (). During model testing, the probability of a blank token is usually quite large [13]. This led to an increase in deletion error where output missed many tokens.…”
Section: Blank Label Re-weightingmentioning
confidence: 99%
“…To deal with this problem, we refer to the blank label re-weighting approach [13]. This original article proposed reducing the probability of a blank token.…”
Section: Blank Label Re-weightingmentioning
confidence: 99%
“…In the last few years there has been an increasing interest both in academia and in the industry to find ways to move the computation effort of deep learning inference from centralized clouds and servers closer to the network edge and to endpoint devices (mobiles, smart cameras, etc.). For widely deployed deep learning applications, such as speech recognition [1], performing inference on the endpoint device reduces network bandwidth usage, latency, and need for exposing potentially sensitive data to the network. On the other hand, deep learning inference requires the endpoint device to perform a significant amount of computations, which is potentially not feasible for very low-end mobile devices.…”
Section: Introductionmentioning
confidence: 99%
“…In recent years, end-to-end (E2E) automatic speech recognition (ASR) technology has made great progress with its simplified architecture and competitive performance. Transducer [1,2,3,4,5,6] and attention based encoder-decoder (AED) [7,8,9] are two popular E2E frameworks. For both kinds of models, the encoder is crucial for good performance.…”
Section: Introductionmentioning
confidence: 99%