The LIDES Coding Manual

Barnett, R.; Codó, Eva; Eppler, Eva Duran; Forcadell, Montse; Gardner‐Chloros, Penelope; Hout, R.W.N.M. van; Moyer, Melissa G.; Torras, Maria Carme; Turell, M. Teresa; Sebba, Mark; Starren, Marianne; Wensing, S.

doi:10.1177/13670069000040020101

Cited by 34 publications

(11 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Here, we showcase the different perspectives that the treatment of lone items provides on CS. Barnett et al (2000) developed the Multilingual-Index (M-Index) as a measure of the multilinguality of different corpora, or the distribution of languages in a corpus. Guzman et al ( 2017) also created the Integration-Index (I-Index), which is meant to measure the probability of CS in different multilingual corpora.…”

Section: Related Methodsmentioning

confidence: 99%

Code-Switching Metrics Using Intonation Units

Pattichis,

LaCasse,

Trawick

et al. 2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Code-switching (CS) metrics in NLP that are based on word-level units are misaligned with true bilingual CS behavior. Crucially, CS is not equally likely between any two words, but follows syntactic and prosodic rules. We adapt two metrics, multilinguality and CS probability, and apply them to transcribed bilingual speech, for the first time putting forward Intonation Units (IUs) -prosodic speech segments -as basic tokens for NLP tasks. In addition, we calculate these two metrics separately for distinct mixing types: alternating-language multiword strings and single-word incorporations from one language into another. Results indicate that individual differences according to the two CS metrics are independent. However, there is a shared tendency among bilinguals for multi-word CS to occur across, rather than within, IU boundaries. That is, bilinguals tend to prosodically separate their two languages. This constraint is blurred when metric calculations do not distinguish multi-word and singleword items. These results call for a reconsideration of units of analysis in future development of CS datasets for NLP tasks.

show abstract

Section: Related Methodsmentioning

confidence: 99%

Code-Switching Metrics Using Intonation Units

Pattichis,

LaCasse,

Trawick

et al. 2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

show abstract

“…Data was gathered at various times for each family: from 2003-2008 (family in France) and from 2007-2010 (families in Norway and Finland). The transcription of data collected was carried out on the pattern from LIDES coding manual (Barnett et al 2000). It seems appropriate to follow this transcription guideline for the members of Indian communities who have three to four languages in their verbal repertoire.…”

Section: Methodology and Hypothesesmentioning

confidence: 99%

Truncated multilingual repertoire in Indian migrant families in three cities of Europe

Haque¹

2011

ESUKA-JEFUL

View full text Add to dashboard Cite

This paper outlines a case study of language choices between three Indian immigrant families residing in three different countries of Europe – France, Norway, and Finland. The issue is to focus on the language practices of the immigrant members when the verbal repertoire is composed of several languages owing to truncated competencies. What are the languages at their disposal and how do they exploit their truncated multilingual repertoire within different social settings? The sociolinguistic and ethnographic empirical observations have yielded useful information on the multiple usages of languages by our participants. This paper reveals how several languages are handled in the daily life with limited or high competence in each of them displaying the intricacies of truncated multilingual repertoire. The latter has been found playing a pivotal role in situating language practices with different segments of the society

show abstract

“…Therefore, traditional supervised evaluation metrics -BLEU [32], Rouge [33] can not be used directly to evaluate the personification aspects of code-mixed generation models. Similarly, other extrinsic evaluation measures such as Multilingual index (M Index) [34], Burstiness and Span Entropy [35] can not be used, as these metrics are predominantly used to evaluate the ability to capture corpus-level switching patterns of generative models. To overcome the limitations of the existing evaluation metrics, we propose four metrics for benchmarking generated codemixed texts against the historical utterances by different users.…”

Section: B Evaluation Metricsmentioning

confidence: 99%

Persona-Aware Generative Model for Code-Mixed Language

Sengupta,

Akhtar,

Chakraborty

2023

Preprint

View full text Add to dashboard Cite

The LIDES Coding Manual

Cited by 34 publications

References 0 publications

Code-Switching Metrics Using Intonation Units

Code-Switching Metrics Using Intonation Units

Truncated multilingual repertoire in Indian migrant families in three cities of Europe

Persona-Aware Generative Model for Code-Mixed Language

Contact Info

Product

Resources

About