2021
DOI: 10.48550/arxiv.2108.01129
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Decoupling recognition and transcription in Mandarin ASR

Abstract: Much of the recent literature on automatic speech recognition (ASR) is taking an end-to-end approach. Unlike English where the writing system is closely related to sound, Chinese characters (Hanzi) represent meaning, not sound. We propose factoring audio → Hanzi into two sub-tasks: (1) audio → Pinyin and (2) Pinyin → Hanzi, where Pinyin is a system of phonetic transcription of standard Chinese. Factoring the audio → Hanzi task in this way achieves 3.9% CER (character error rate) on the Aishell-1 corpus, the be… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(3 citation statements)
references
References 50 publications
(55 reference statements)
0
3
0
Order By: Relevance
“…As shown in Table 1, we compare SCaLa with state-of-the-art ASR systems including hybrid [28], end-to-end [5,29], and selfsupervised learning [10,30]. Numerically, SCaLa outperforms the traditional CTC models [20] with 2.84% and 1.38% CER reductions on reading and spontaneous speech data, respectively.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…As shown in Table 1, we compare SCaLa with state-of-the-art ASR systems including hybrid [28], end-to-end [5,29], and selfsupervised learning [10,30]. Numerically, SCaLa outperforms the traditional CTC models [20] with 2.84% and 1.38% CER reductions on reading and spontaneous speech data, respectively.…”
Section: Resultsmentioning
confidence: 99%
“…4.3. Experimental results also show that SCaLa significantly outperforms hybrid chain models [28], end-to-end CTC-Conformer systems [29], self-supervised systems learning [10,30], and methods with phoneme masking [5].…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation