ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9413648
|View full text |Cite
|
Sign up to set email alerts
|

Phoneme Based Neural Transducer for Large Vocabulary Speech Recognition

Abstract: To join the advantages of classical and end-to-end approaches for speech recognition, we present a simple, novel and competitive approach for phoneme-based neural transducer modeling. Different alignment label topologies are compared and word-end-based phoneme label augmentation is proposed to improve performance. Utilizing the local dependency of phonemes, we adopt a simplified neural network structure and a straightforward integration with the external word-level language model to preserve the consistency of… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
14
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
1

Relationship

2
4

Authors

Journals

citations
Cited by 21 publications
(14 citation statements)
references
References 31 publications
0
14
0
Order By: Relevance
“…This corresponds to time-synchronous transducer models [6,7,10], where the RNN-T vertical transition is replaced with a diagonal transition, i.e. u = t and U = T .…”
Section: Special Case: Strict Monotonicitymentioning
confidence: 99%
See 4 more Smart Citations
“…This corresponds to time-synchronous transducer models [6,7,10], where the RNN-T vertical transition is replaced with a diagonal transition, i.e. u = t and U = T .…”
Section: Special Case: Strict Monotonicitymentioning
confidence: 99%
“…One setup of our experiments is done on the 2nd release of the TED-LIUM corpus (TED-LIUM-v2) [22]. We use the same phoneme-based transducer model from [10], which has the strict monotonicity constraint as described in Section 2.5. Additionally, the model assumes a first-order dependency which still fits in the equivalence transformation shown in Section 2.3.…”
Section: Phoneme-based Transducer On Ted-lium-v2mentioning
confidence: 99%
See 3 more Smart Citations