Proceedings of the First Workshop on Computational Approaches to Code Switching 2014
DOI: 10.3115/v1/w14-3907
|View full text |Cite
|
Sign up to set email alerts
|

Overview for the First Shared Task on Language Identification in Code-Switched Data

Abstract: We present an overview of the first shared task on language identification on codeswitched data.The shared task included code-switched data from four language pairs: Modern Standard ArabicDialectal Arabic (MSA-DA), MandarinEnglish (MAN-EN), Nepali-English (NEP-EN), and Spanish-English (SPA-EN). A total of seven teams participated in the task and submitted 42 system runs. The evaluation showed that language identification at the token level is more difficult when the languages present are closely related, as in… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
195
0

Year Published

2014
2014
2024
2024

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 152 publications
(195 citation statements)
references
References 8 publications
0
195
0
Order By: Relevance
“…The shared task for "Language Identification in Code-Switched Data" (Solorio et al, 2014) aims at allowing participants to perform wordlevel language identification in code-switched Spanish-English, MSA-DA, Chinese-English and Nepalese-English data. In this work, we only focus on MSA-DA data.…”
Section: Shared Task Descriptionmentioning
confidence: 99%
See 1 more Smart Citation
“…The shared task for "Language Identification in Code-Switched Data" (Solorio et al, 2014) aims at allowing participants to perform wordlevel language identification in code-switched Spanish-English, MSA-DA, Chinese-English and Nepalese-English data. In this work, we only focus on MSA-DA data.…”
Section: Shared Task Descriptionmentioning
confidence: 99%
“…The system relies on Language Models and a tool for morphological analysis and disambiguation for Arabic to identify the class of each word in a given sentence. We evaluate the performance of our system on the test datasets of the shared task at the EMNLP workshop on Computational Approaches to Code Switching (Solorio et al, 2014). The system yields an average token-level F β=1 score of 93.6%, 77.7% and 80.1%, on the first, second, and surprise-genre test-sets, respectively, and a tweet-level F β=1 score of 4.4%, 36% and 27.7%, on the same test-sets.…”
mentioning
confidence: 99%
“…This paper describes DCU-UVT's participation in the shared task Language Identification in Code-Switched Data (Solorio et al, 2014) at the Workshop on Computational Approaches to Code Switching, EMNLP, 2014. The task is to make word-level predictions (six labels: lang1, lang2, ne, mixed, ambiguous and other) for mixedlanguage user generated content.…”
Section: Introductionmentioning
confidence: 99%
“…The main aim of this paper is to describe my system submission to the Computational Approaches to Code Switching task (Solorio et al, 2014). The training dataset provided for the classification task were tweets composed of Spanish and English words or Nepali and English words.…”
Section: Introductionmentioning
confidence: 99%