2021
DOI: 10.1186/s13636-021-00222-7
|View full text |Cite
|
Sign up to set email alerts
|

Pronunciation augmentation for Mandarin-English code-switching speech recognition

Abstract: Code-switching (CS) refers to the phenomenon of using more than one language in an utterance, and it presents great challenge to automatic speech recognition (ASR) due to the code-switching property in one utterance, the pronunciation variation phenomenon of the embedding language words and the heavy training data sparse problem. This paper focuses on the Mandarin-English CS ASR task. We aim at dealing with the pronunciation variation and alleviating the sparse problem of code-switches by using pronunciation a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 39 publications
0
2
0
Order By: Relevance
“…The language recognition system is used to realize the recognition of mixed-language speech in a single-language system. Long et al. (2021) pointed out that in order to make good use of a large amount of unlabeled data, semi-supervised learning is used to optimize the pronunciation dictionary, acoustic model and language model in the Chinese–English mixed speech recognition system.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The language recognition system is used to realize the recognition of mixed-language speech in a single-language system. Long et al. (2021) pointed out that in order to make good use of a large amount of unlabeled data, semi-supervised learning is used to optimize the pronunciation dictionary, acoustic model and language model in the Chinese–English mixed speech recognition system.…”
Section: Related Workmentioning
confidence: 99%
“…The language recognition system is used to realize the recognition of mixed-language speech in a single-language system. Long et al (2021) pointed out that in order to make good use of a large amount of unlabeled data, semi-supervised learning is used to optimize the pronunciation dictionary, acoustic model and language model in the Chinese-English mixed speech recognition system. Part of the label mismatch data in the data set is treated as unsupervised data, and the standard Chinese and English pronunciation dictionary is modified in a semi-supervised manner, thereby solving the related problems caused by the spoken population.…”
Section: Related Workmentioning
confidence: 99%