Interspeech 2017 2017
DOI: 10.21437/interspeech.2017-1429
|View full text |Cite
|
Sign up to set email alerts
|

Metrics for Modeling Code-Switching Across Corpora

Abstract: In developing technologies for code-switched speech, it would be desirable to be able to predict how much language mixing might be expected in the signal and the regularity with which it might occur. In this work, we offer various metrics that allow for the classification and visualization of multilingual corpora according to the ratio of languages represented, the probability of switching between them, and the time-course of switching. Applying these metrics to corpora of different languages and genres, we fi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
45
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
3
2

Relationship

1
9

Authors

Journals

citations
Cited by 45 publications
(46 citation statements)
references
References 22 publications
1
45
0
Order By: Relevance
“…Recent studies have focused on empirical measurements of code-switching (Guzmán et al, 2017). The multilingual index(M-Index), Language Entropy and Integration index(I-index) measure the extent of mixing and switching frequency.…”
Section: Discussionmentioning
confidence: 99%
“…Recent studies have focused on empirical measurements of code-switching (Guzmán et al, 2017). The multilingual index(M-Index), Language Entropy and Integration index(I-index) measure the extent of mixing and switching frequency.…”
Section: Discussionmentioning
confidence: 99%
“…Here we analyse the features like length distribution and diversity of code-switching of generated synthetic texts. We also measured one sentence level metric Code-Mixing Index (CMI) coined by [10], and three corpus level metrics Multilingual index (M-Index), Burstiness and Span Entropy that were introduced in [13] to demonstrate how different the generated texts are from the training corpus in terms of switching.…”
Section: Direct/intrinsic Evaluationmentioning
confidence: 99%
“…• Importantly, bilingual speech practices are complex and it is not clear that the traditional binary typology of insertional and alternational C-S, while useful as a heuristic, is adequate to characterize the nature of C-S (Auer and Muhamedova, 2005). There have been recent attempts to quantify mixing complexity with the aim of arriving at empirically reliable comparisons of C-S between corpora (Gambäck and Das, 2016;Das and Gambäck, 2014;Jamatia et al, 2015;Guzman et al, 2016;Guzmán et al, 2017a). Each aims to capture the fact that C-S may vary along multiple planes.…”
Section: • Sentencementioning
confidence: 99%