Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications 2017
DOI: 10.18653/v1/w17-5042
|View full text |Cite
|
Sign up to set email alerts
|

CIC-FBK Approach to Native Language Identification

Abstract: We present the CIC-FBK system, which took part in the Native Language Identification (NLI) Shared Task 2017. Our approach combines features commonly used in previous NLI research, i.e., word n-grams, lemma n-grams, part-of-speech n-grams, and function words, with recently introduced character n-grams from misspelled words, and features that are novel in this task, such as typed character n-grams, and syntactic n-grams of words and of syntactic relation tags. We use log-entropy weighting scheme and perform clas… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
12
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
8
2

Relationship

2
8

Authors

Journals

citations
Cited by 19 publications
(13 citation statements)
references
References 17 publications
0
12
0
Order By: Relevance
“…Their experiments indicate that inclusion of the sentence prediction features provides a small increase in performance. (Markov et al, 2017) build an SVM with multiple lexical and syntactic features. They introduce two new feature types -typed character n-grams and syntactic n-grams -and combine them with word, lemma, and POS n-grams, function words, and spelling error character n-grams.…”
Section: Essay-only Trackmentioning
confidence: 99%
“…Their experiments indicate that inclusion of the sentence prediction features provides a small increase in performance. (Markov et al, 2017) build an SVM with multiple lexical and syntactic features. They introduce two new feature types -typed character n-grams and syntactic n-grams -and combine them with word, lemma, and POS n-grams, function words, and spelling error character n-grams.…”
Section: Essay-only Trackmentioning
confidence: 99%
“…The first classifier based on logistic regression works at the sentences level; results are provided to the second classifier which uses the support vector method and already works at the level of the whole text. The second result (88.08%) was shown by the CIC-FBK team [Markov et al, 2017] also used the support vector machine based on the standard set of features, such as symbolic, vocabulary and POS n-grams, functional words. In addition to these features, several new features including syntactic n-grams were also used.…”
Section: Related Workmentioning
confidence: 97%
“…3.3.1 Part-of-speech tags and function words POS features capture the morpho-syntactic patterns in a text, and are indicative of the L1, especially when used in combination with other types of features (Cimino and Dell'Orletta, 2017;Markov et al, 2017). POS tags were obtained with TreeTagger (Schmid, 1999), which uses the Penn Treebank tagset (36 tags).…”
Section: Featuresmentioning
confidence: 99%