Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications 2019
DOI: 10.18653/v1/w19-4429
|View full text |Cite
|
Sign up to set email alerts
|

Anglicized Words and Misspelled Cognates in Native Language Identification

Abstract: In this paper, we present experiments that estimate the impact of specific lexical choices of people writing in a second language (L2). In particular, we look at misspelled words that indicate lexical uncertainty on the part of the author, and separate them into three categories: misspelled cognates, "L2-ed" (in our case, anglicized) words, and all other spelling errors. We test the assumption that such errors contain clues about the native language of an essay's author through the task of native language iden… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2020
2020
2020
2020

Publication Types

Select...
2

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 17 publications
0
2
0
Order By: Relevance
“…NLI is usually approached from a machine learning perspective as a multi-class classification problem of assigning class labels representing L1s to texts written in L2, where the main focus (of the traditional machine learning) is to design features that capture the systematic fingerprints of the first language in the second language writing (native language interference (Odlin, 1989)). Numerous feature types that capture various aspects of the interference phenomenon have been explored for NLI: spelling errors (Koppel et al, 2005;; lexical features, e.g., word and lemma n-grams (Jarvis et al, 2013), cognates (Markov et al, 2019), etymologically-related words ; syntactic features, e.g., context-free grammar features (Wong and Dras, 2011), Stanford parser dependency features (Tetreault et al, 2012); stylometric features, e.g., punctuation (Markov et al, 2018a), character n-gram features (Kulmizev et al, 2017); emotion-based features (Markov et al, 2018b), etc. The combination of such features provides the best results for NLI, as shown by the two shared tasks on the NLI task organized in 2013 and 2017 (Malmasi et al, 2017), where the two top-ranked systems (Cimino and Dell'Orletta, 2017;Markov et al, 2017) used Support Vector Machines (SVM) with a variety of engineered features.…”
Section: Introductionmentioning
confidence: 99%
“…NLI is usually approached from a machine learning perspective as a multi-class classification problem of assigning class labels representing L1s to texts written in L2, where the main focus (of the traditional machine learning) is to design features that capture the systematic fingerprints of the first language in the second language writing (native language interference (Odlin, 1989)). Numerous feature types that capture various aspects of the interference phenomenon have been explored for NLI: spelling errors (Koppel et al, 2005;; lexical features, e.g., word and lemma n-grams (Jarvis et al, 2013), cognates (Markov et al, 2019), etymologically-related words ; syntactic features, e.g., context-free grammar features (Wong and Dras, 2011), Stanford parser dependency features (Tetreault et al, 2012); stylometric features, e.g., punctuation (Markov et al, 2018a), character n-gram features (Kulmizev et al, 2017); emotion-based features (Markov et al, 2018b), etc. The combination of such features provides the best results for NLI, as shown by the two shared tasks on the NLI task organized in 2013 and 2017 (Malmasi et al, 2017), where the two top-ranked systems (Cimino and Dell'Orletta, 2017;Markov et al, 2017) used Support Vector Machines (SVM) with a variety of engineered features.…”
Section: Introductionmentioning
confidence: 99%
“…Within traditional machine learning, NLI is usually approached as a multi-class classification problem of assigning class labels representing L1s to texts written in L2, where the main focus is to design features that capture the systematic fingerprints of the first language in the second language writing (native language interference (Odlin, 1989)). These features include: spelling errors (Koppel et al, 2005;Chen et al, 2017); lexical features, e.g., word and lemma n-grams (Jarvis et al, 2013), cognates (Markov et al, 2019), etymologically-related words ; syntactic features, e.g., context-free grammar features (Wong and Dras, 2011), Stanford parser dependency features (Tetreault et al, 2012); stylometric features, e.g., punctuation (Markov et al, 2018a), character n-gram features (Kulmizev et al, 2017); emotion-based features (Markov et al, 2018b), etc. The combination of such features provides the best results for NLI, as shown by the two shared tasks organized in 2013 and 2017 (Malmasi et al, 2017), where the two top-ranked systems (Cimino and Dell'Orletta, 2017;Markov et al, 2017) used Support Vector Machines (SVM) with a variety of engineered features.…”
Section: Background and Motivationmentioning
confidence: 99%