2020
DOI: 10.1109/access.2020.2986255
|View full text |Cite
|
Sign up to set email alerts
|

Exploration of End-to-End Framework for Code-Switching Speech Recognition Task: Challenges and Enhancements

Abstract: The end-to-end (E2E) framework has emerged as a viable alternative to conventional hybrid systems in automatic speech recognition (ASR) domain. Unlike the monolingual case, the challenges faced by an E2E system in code-switching ASR task include (i) the expansion of target set to account for multiple languages involved, (ii) the requirement of a robust target-to-word (T2W) transduction, and (iii) the need for more effective context modeling. In this paper, we aim to address those challenges for reliable traini… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
20
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(20 citation statements)
references
References 36 publications
0
20
0
Order By: Relevance
“…A bilingual ASR system can be implemented by extending each component or model of ASR system from monolingual to bilingual [19][20][21]. Which means all models that are used for implementing ASR are remain the same once it comes for implementing bilingual ASR.…”
Section: Figure 1 Architectural Design Of Contemporary Bilingual Asr ...mentioning
confidence: 99%
See 1 more Smart Citation
“…A bilingual ASR system can be implemented by extending each component or model of ASR system from monolingual to bilingual [19][20][21]. Which means all models that are used for implementing ASR are remain the same once it comes for implementing bilingual ASR.…”
Section: Figure 1 Architectural Design Of Contemporary Bilingual Asr ...mentioning
confidence: 99%
“…Code-mixing that speech contains both languages within the same sentence. On the other hand, code-switching that speaker switches sentences using both languages [1,19,20,22,31]. The taxonomy of bilingual ASR as presented in Fig.…”
Section: A Taxonomy Of Bilingual Asrmentioning
confidence: 99%
“…Connectionist Temporal Classification (CTC) loss [1] has gained popularity in the community due to its computational efficiency and ability to train character-based end-to-end models. Character-based models, requiring no dictionary, lead to greater usability across different languages in both mono- [2,3,4,5,6] and multilingual [7,8,9] setups. However, when dealing with utterances with CS, an outputted word from a CTC model can contain a mixture of characters from different languages because the model lacks the context when predicting the output.…”
Section: Introductionmentioning
confidence: 99%
“…Recently, in addition to multilingual ASR, attention has been paid to design code-switching (CS) ASR. Studies have been conducted for Mandarin-English [12], Hindi-English [13], and French-Arabic [14] and little to no prior work in dialectal code-switching.…”
Section: Introductionmentioning
confidence: 99%
“…This drawback restricts the exploitation of End-to-End (E2E) systems. However, recent studies like [13] models Hindi-English CS using E2E attention model, and [15] uses context-dependent target to word transduction, factorized language model and code-switching identification. The authors in [16] proposed two symmetric language-specific encoders to capture the individual language attributes in a transformer-based architecture.…”
Section: Introductionmentioning
confidence: 99%