Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-1290
|View full text |Cite
|
Sign up to set email alerts
|

Investigation of Transformer Based Spelling Correction Model for CTC-Based End-to-End Mandarin Speech Recognition

Abstract: Connectionist Temporal Classification (CTC) based end-to-end speech recognition system usually need to incorporate an external language model by using WFST-based decoding in order to achieve promising results. This is more essential to Mandarin speech recognition since it owns a special phenomenon, namely homophone, which causes a lot of substitution errors. The linguistic information introduced by language model is somehow helpful to distinguish these substitution errors. In this work, we propose a transforme… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 29 publications
(13 citation statements)
references
References 21 publications
0
13
0
Order By: Relevance
“…A class of neural correction models post-process hypotheses using only the text information, and can be considered as second-pass models [11,12,13]. The models typically use beam search to generate new hypotheses, compared to rescoring where one leverages external language models trained with large text corpora [14].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…A class of neural correction models post-process hypotheses using only the text information, and can be considered as second-pass models [11,12,13]. The models typically use beam search to generate new hypotheses, compared to rescoring where one leverages external language models trained with large text corpora [14].…”
Section: Introductionmentioning
confidence: 99%
“…For example, a neural correction model in [11] takes first-pass text hypotheses and generates new sequences to improve numeric utterance recognition [15]. A transformer-based spelling correction model is proposed in [12] to correct the outputs of a connectionist temporal classification model in Mandarin ASR. In addition, [13] leverages text-to-speech (TTS) audio to train an attention-based neural spelling corrector to improve LAS decoding.…”
Section: Introductionmentioning
confidence: 99%
“…Then decoder rephrases this semantic information using correct tokens. In [5], Transformer-based text correction model is used to correct output of CTC-based Mandarin ASR systems. In [13], transfer learning technique is applied in training Transformer-based text correction models.…”
Section: Sequence-to-sequence Text Correction Methodsmentioning
confidence: 99%
“…There is a plethora of prior works on ASR systems output correction [5][6] [7]. In this section, three lines of related work are briefly reviewed, pretrained language model, pipeline and endto-end text-based correction method.…”
Section: Related Workmentioning
confidence: 99%
“…Although a single E2E model can already achieve very good ASR performance, its performance can be further improved with a second-pass model. Spelling correction methods [200,215] were proposed by using TTS data to train a separate translation model which is used to correct the hypothesis errors made by the first-pass E2E model. The spelling correction model is a pure text-to-text model without using the speech input.…”
Section: C) Two-pass Modelsmentioning
confidence: 99%