Interspeech 2020 2020
DOI: 10.21437/interspeech.2020-1930
|View full text |Cite
|
Sign up to set email alerts
|

Phoneme-to-Grapheme Conversion Based Large-Scale Pre-Training for End-to-End Automatic Speech Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 8 publications
(4 citation statements)
references
References 21 publications
0
4
0
Order By: Relevance
“…The most straightforward approach is to simply use multi-speaker TTS to generate the waveform with various acoustic variations [314]- [318]. The other approaches are based on the generation of high-level (more linguistic) features instead of generating the waveform, e.g., encoder features [319] and phoneme features [320], [321]. This approach is similar to the back-translation technique developed in neural machine translation [322].…”
Section: From Representation Learning To Zero Resourcesmentioning
confidence: 99%
“…The most straightforward approach is to simply use multi-speaker TTS to generate the waveform with various acoustic variations [314]- [318]. The other approaches are based on the generation of high-level (more linguistic) features instead of generating the waveform, e.g., encoder features [319] and phoneme features [320], [321]. This approach is similar to the back-translation technique developed in neural machine translation [322].…”
Section: From Representation Learning To Zero Resourcesmentioning
confidence: 99%
“…The most straightforward approach is to simply use multi-speaker TTS to generate the waveform with various acoustic variations [293]- [297]. The other approaches are based on the generation of high-level (more linguistic) features instead of generating the waveform, e.g., encoder features [298] and phoneme features [299], [300]. This approach is similar to the back-translation technique developed in NMT [301].…”
Section: Robustness and Transferabilitymentioning
confidence: 99%
“…To generate acoustically similar errors on text data, we leverage phone information that can be automatically obtained from word sequences using a pronunciation dictionary [38,39]. We propose a modified version of ELECTRA called phone-attentive ELECTRA (P-ELECTRA).…”
Section: Phone-attentive Electramentioning
confidence: 99%