Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-1959
|View full text |Cite
|
Sign up to set email alerts
|

Exploiting Monolingual Speech Corpora for Code-Mixed Speech Recognition

Abstract: One of the main challenges in building code-mixed ASR systems is the lack of annotated speech data. Often, however, monolingual speech corpora are available in abundance for the languages in the code-mixed speech. In this paper, we explore different techniques that use monolingual speech to create synthetic code-mixed speech and examine their effect on training models for code-mixed ASR. We assume access to a small amount of real code-mixed text, from which we extract probability distributions that govern the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
7
1

Relationship

2
6

Authors

Journals

citations
Cited by 13 publications
(11 citation statements)
references
References 22 publications
0
10
0
Order By: Relevance
“…However, utilizing these monolingual corpora is challenging [30] owing to factors like pronunciation shift, accent shift and phone influence that occur in code-switched data [9,31]. Previous work [15,32] has attempted different strategies to overcome this issue. Inspired by the model in [15], we propose a multi-label/multi-audio encoder for handling monolingual corpora in transformer-transducer, as depicted on the right side of Figure 1.…”
Section: Leveraging Monolingual Corpora For Code-switched Asrmentioning
confidence: 99%
See 1 more Smart Citation
“…However, utilizing these monolingual corpora is challenging [30] owing to factors like pronunciation shift, accent shift and phone influence that occur in code-switched data [9,31]. Previous work [15,32] has attempted different strategies to overcome this issue. Inspired by the model in [15], we propose a multi-label/multi-audio encoder for handling monolingual corpora in transformer-transducer, as depicted on the right side of Figure 1.…”
Section: Leveraging Monolingual Corpora For Code-switched Asrmentioning
confidence: 99%
“…Code-Switching Background The first speech recognizer for codeswitched data [38] was trained on the SEAME corpus [34]. They looked at phone-merging techniques to handle the two languages in acoustic modeling, explored further in [9,31], and generating codeswitched text data for language modeling, studied more in [32,39]. Since then, different approaches have been applied to improve codeswitched speech recognition like speech chains [40], transliteration [41], and translation [42].…”
Section: Relation To Prior Workmentioning
confidence: 99%
“…In contrast, ours is a much simpler sequence labeling formulation. Taneja et al (2019) proposes to splice fragments chosen from monolingual corpus of two languages based on statistics of length distribution and phone transition. Pratapa et al (2018) use Equivalence Constraint Theory to define rules on top of parse trees of two sentences to create grammatically valid artificial CM data.…”
Section: Related Workmentioning
confidence: 99%
“…Apart from techniques targeted at CS ASR, several other data augmentation techniques have been shnown to boost the performance of ASR systems such as speed perturbation [17], randomized spectrogram masking [18], vocal tract length perturbation and semi-supervised learning [19]. Due to the lack of sufficient CS labelled corpora for training, [20,21] train ASR systems using synthetic datasets created by concatenating monolingual segments from the constituent languages of the CS speech. In contrast, we explore the use of samples generated from a TTS systems for training our ASR system.…”
Section: Related Workmentioning
confidence: 99%