Multilingual and code-switching ASR challenges for low resource Indian languages

Diwan, Anuj; Vaideeswaran, Rakesh; Shah, Sanket; Singh, Ankita; Raghavan, Srinivasa; Khare, Shreya; Unni, Vinit; Vyas, Saurabh; Rajpuria, Akash; Yarra, Chiranjeevi; Mittal, Ashish; Ghosh, Prasanta Kumar; Jyothi, Preethi; Bali, Kalika; Seshadri, Vivek; Sitaram, Sunayana; Bharadwaj, Samarth; Nanavati, Jai; Nanavati, Raoul; Sankaranarayanan, Karthik; Seeram, Tejaswi; Abraham, Basil

doi:10.48550/arxiv.2104.00235

Cited by 6 publications

(7 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Koenecke et al ( 2020) compared industrial ASR systems and found a 35% word error rate for African American English compared to 19% for white speakers of American English. There has also been research on how elderly speakers are transcribed with more errors (Pellegrini et al, 2012;Vipperla et al, 2008); Gretter et al (2020) document lower performance when transcribing non-native speech in English and German, and while there is research on transcription of code-switching (Diwan et al, 2021;Li et al, 2019;Seki et al, 2018;Yue et al, 2019), this remains a weakness for most systems. These types of sociolinguistic variables are amongst the ones researchers are most interested in, so one has to be aware of potential differences in ASR performance on the specific data of interest (Hooker, 2021;Martin & Tang, 2020).…”

Section: Automatic Speech Recognition For Sociophoneticsmentioning

confidence: 99%

Computational sociophonetics using automatic speech recognition

Coto-Solano

2022

Language and Linguist. Compass

View full text Add to dashboard Cite

Recent years have seen numerous advances in natural language processing that can help accelerate sociophonetic work. These include software to align speech recordings with their transcriptions, as well as to transcribe audio automatically. This solves a major bottleneck and will help process larger datasets and test hypotheses more efficiently. This paper will summarise recent progress, highlight relevant examples of sociophonetic research, and comment on the technical and ethical issues at the cutting edge of natural language processing.

show abstract

Section: Automatic Speech Recognition For Sociophoneticsmentioning

confidence: 99%

Computational sociophonetics using automatic speech recognition

Coto-Solano

2022

Language and Linguist. Compass

View full text Add to dashboard Cite

show abstract

“…The Indic ASR challenge 2021 [2,3] consists of two sub-tasks. In sub-task 1, the main objective is to build a multilingual ASR system for Indian languages.…”

Section: Indic Asr Challenge 2021mentioning

confidence: 99%

“…For sub-task 2, in addition to these models, an E2E conformer [20] model was also used as a baseline. The details of these baseline models can be found in [3].…”

Section: Implementation Detailsmentioning

confidence: 99%

See 1 more Smart Citation

Dual Script E2E Framework for Multilingual and Code-Switching ASR

Kumar¹,

Kuriakose²,

Thyagachandran³

et al. 2021

Interspeech 2021

View full text Add to dashboard Cite

India is home to multiple languages, and training automatic speech recognition (ASR) systems is challenging. Over time, each language has adopted words from other languages, such as English, leading to code-mixing. Most Indian languages also have their own unique scripts, which poses a major limitation in training multilingual and code-switching ASR systems.Inspired by results in text-to-speech synthesis, in this paper, we use an in-house rule-based phoneme-level common label set (CLS) representation to train multilingual and code-switching ASR for Indian languages. We propose two end-to-end (E2E) ASR systems. In the first system, the E2E model is trained on the CLS representation, and we use a novel data-driven backend to recover the native language script. In the second system, we propose a modification to the E2E model, wherein the CLS representation and the native language characters are used simultaneously for training. We show our results on the multilingual and code-switching tasks of Indic ASR Challenge 2021. Our best results achieve ≈ 6% and 5% improvement in word error rate over the baseline system for the multilingual and code-switching tasks, respectively, on the challenge development data.

show abstract

“…However, most of these large scale models skew towards highresourced languages [9] and do not seek to directly optimize for intra-sentential CS ASR between particular language pairs. A more promising direction towards zero-shot CS ASR can be found in prior works which seek to incorporate monolingual data directly to improve CS performance [14][15][16][17][18][19][20][21][22][23][24][25][26][27][28]. In particular, there are several works which achieve joint modeling of CS and monolingual ASR by conditionally factorizing the overall bilingual task into monolingual parts [29][30][31].…”

Section: Introductionmentioning

confidence: 99%

Towards Zero-Shot Code-Switched Speech Recognition

Yan¹,

Wiesner²,

Klejch³

et al. 2022

Preprint

View full text Add to dashboard Cite

In this work, we seek to build effective code-switched (CS) automatic speech recognition systems (ASR) under the zero-shot setting where no transcribed CS speech data is available for training. Previously proposed frameworks which conditionally factorize the bilingual task into its constituent monolingual parts are a promising starting point for leveraging monolingual data efficiently. However, these methods require the monolingual modules to perform language segmentation. That is, each monolingual module has to simultaneously detect CS points and transcribe speech segments of one language while ignoring those of other languages -not a trivial task. We propose to simplify each monolingual module by allowing them to transcribe all speech segments indiscriminately with a monolingual script (i.e. transliteration). This simple modification passes the responsibility of CS point detection to subsequent bilingual modules which determine the final output by considering multiple monolingual transliterations along with external language model information. We apply this transliteration-based approach in an end-to-end differentiable neural network and demonstrate its efficacy for zeroshot CS ASR on Mandarin-English SEAME test sets.

show abstract

Multilingual and code-switching ASR challenges for low resource Indian languages

Cited by 6 publications

References 13 publications

Computational sociophonetics using automatic speech recognition

Computational sociophonetics using automatic speech recognition

Dual Script E2E Framework for Multilingual and Code-Switching ASR

Towards Zero-Shot Code-Switched Speech Recognition

Contact Info

Product

Resources

About