Exploration of End-to-End Framework for Code-Switching Speech Recognition Task: Challenges and Enhancements

Sreeram, Ganji; Sinha, Rohit

doi:10.1109/access.2020.2986255

Cited by 8 publications

(20 citation statements)

References 36 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A bilingual ASR system can be implemented by extending each component or model of ASR system from monolingual to bilingual [19][20][21]. Which means all models that are used for implementing ASR are remain the same once it comes for implementing bilingual ASR.…”

Section: Figure 1 Architectural Design Of Contemporary Bilingual Asr ...mentioning

confidence: 99%

See 1 more Smart Citation

Bilingual Automatic Speech Recognition: A Review, Taxonomy and Open Challenges

et al. 2023

View full text Add to dashboard Cite

In this technological era, smart and intelligent systems that are integrated with artificial intelligence (AI) techniques, algorithms, tools, and technologies, have impact on various aspects of our daily life. Communication and interaction between human and machine using speech becomes increasingly important, since it is an obvious substitute for keyboards and screens in the communication process. Therefore, numerous technologies take advantage of speech such as Automatic Speech Recognition (ASR), where human natural speech for many languages is used as the mean to interact with machines. Majority of the related works for ASR concentrate on the development and evaluation of ASR systems that serve a single language (monolingual) only, such as Arabic, English, Chinese, French, and many others. However, research attempts that combine multiple languages (bilingual and multilingual) during the development and evaluation of ASR systems are very limited. This paper aims to provide comprehensive research background and fundamentals of bilingual ASR, and related works that have combined two languages for ASR tasks from 2010 through 2021. It also formulates research taxonomy and discusses open challenges to bilingual ASR research. Based on our literature investigation, it is clear that bilingual ASR using deep learning approach is highly demanded and is able to provide acceptable performance. In addition, many combinations of two languages such as Arabic-English, Arabic-Malay, and others, are not attempted yet by the research community, which can open new research opportunities. Finally, it is clear that ASR research is moving towards not only bilingual ASR, but also multilingual ASR.

show abstract

Section: Figure 1 Architectural Design Of Contemporary Bilingual Asr ...mentioning

confidence: 99%

“…Code-mixing that speech contains both languages within the same sentence. On the other hand, code-switching that speaker switches sentences using both languages [1,19,20,22,31]. The taxonomy of bilingual ASR as presented in Fig.…”

Section: A Taxonomy Of Bilingual Asrmentioning

confidence: 99%

Bilingual Automatic Speech Recognition: A Review, Taxonomy and Open Challenges

et al. 2023

View full text Add to dashboard Cite

show abstract

“…Connectionist Temporal Classification (CTC) loss [1] has gained popularity in the community due to its computational efficiency and ability to train character-based end-to-end models. Character-based models, requiring no dictionary, lead to greater usability across different languages in both mono- [2,3,4,5,6] and multilingual [7,8,9] setups. However, when dealing with utterances with CS, an outputted word from a CTC model can contain a mixture of characters from different languages because the model lacks the context when predicting the output.…”

Section: Introductionmentioning

confidence: 99%

Reducing Spelling Inconsistencies in Code-Switching ASR Using Contextualized CTC Loss

Naowarat

Kongthaworn

Karunratanakul

et al. 2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Code-Switching (CS) remains a challenge for Automatic Speech Recognition (ASR), especially character-based models. With the combined choice of characters from multiple languages, the outcome from character-based models suffers from phoneme duplication, resulting in language-inconsistent spellings. We propose Contextualized Connectionist Temporal Classification (CCTC) loss to encourage spelling consistencies of a character-based nonautoregressive ASR which allows for faster inference. The model trained by CCTC loss is aware of contexts since it learns to predict both center and surrounding letters in a multi-task manner. In contrast to existing CTC-based approaches, CCTC loss does not require frame-level alignments, since the context ground truth is obtained from the model's estimated path. Compared to the same model trained with regular CTC loss, our method consistently improved the ASR performance on both CS and monolingual corpora.

show abstract

“…Recently, in addition to multilingual ASR, attention has been paid to design code-switching (CS) ASR. Studies have been conducted for Mandarin-English [12], Hindi-English [13], and French-Arabic [14] and little to no prior work in dialectal code-switching.…”

Section: Introductionmentioning

confidence: 99%

“…This drawback restricts the exploitation of End-to-End (E2E) systems. However, recent studies like [13] models Hindi-English CS using E2E attention model, and [15] uses context-dependent target to word transduction, factorized language model and code-switching identification. The authors in [16] proposed two symmetric language-specific encoders to capture the individual language attributes in a transformer-based architecture.…”

Section: Introductionmentioning

confidence: 99%

Towards One Model to Rule All: Multilingual Strategy for Dialectal Code-Switching Arabic ASR

Chowdhury¹,

Hussein²,

Abdelali³

et al. 2021

Preprint

View full text Add to dashboard Cite

With the advent of globalization, there is an increasing demand for multilingual automatic speech recognition (ASR), handling language and dialectal variation of spoken content. Recent studies show its efficacy over monolingual systems. In this study, we design a large multilingual end-to-end ASR using selfattention based conformer architecture. We trained the system using Arabic (Ar), English (En) and French (Fr) languages. We evaluate the system performance handling: (i) monolingual (Ar, En and Fr); (ii) multi-dialectal (Modern Standard Arabic, along with dialectal variation such as Egyptian and Moroccan); (iii) code-switching -cross-lingual (Ar-En/Fr) and dialectal (MSA-Egyptian dialect) test cases, and compare with current state-ofthe-art systems. Furthermore, we investigate the influence of different embedding/character representations including character vs word-piece; shared vs distinct input symbol per language. Our findings demonstrate the strength of such a model by outperforming state-of-the-art monolingual dialectal Arabic and code-switching Arabic ASR.

show abstract

Exploration of End-to-End Framework for Code-Switching Speech Recognition Task: Challenges and Enhancements

Cited by 8 publications

References 36 publications

Bilingual Automatic Speech Recognition: A Review, Taxonomy and Open Challenges

Bilingual Automatic Speech Recognition: A Review, Taxonomy and Open Challenges

Reducing Spelling Inconsistencies in Code-Switching ASR Using Contextualized CTC Loss

Towards One Model to Rule All: Multilingual Strategy for Dialectal Code-Switching Arabic ASR

Contact Info

Product

Resources

About