Dependency parsing of learner English

Huang, Yan; Murakami, Akira; Alexopoulou, Theodora; Korhonen, Anna-Leena

doi:10.1075/ijcl.16080.hua

Cited by 56 publications

(25 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Similarly, morphosyntactic error types may be identified by using POS patterns. [14] reports the robustness of parsers when analysing learner data, and that dependency parsing is more sensitive to errors than PoS-tagging. Conversely, error types such as verb tense remain challenging in terms of implementation due to the semantic value of contexts.…”

Section: Conclusion and Future Researchmentioning

confidence: 99%

A Supervised Learning Model for the Automatic Assessment of Language Levels Based on Learner Errors

Ballier

Gaillat

Simpkin

et al. 2019

Lecture Notes in Computer Science

View full text Add to dashboard Cite

This paper focuses on the use of technology in language learning. Language training requires the need to group learners homogeneously and to provide them with instant feedback on their productions such as errors [8, 15, 17] or proficiency levels. A possible approach is to assess writings from students and assign them with a level. This paper analyses the possibility of automatically predicting Common European Framework of Reference (CEFR) language levels on the basis of manually annotated errors in a written learner corpus [9, 11]. The research question is to evaluate the predictive power of errors in terms of levels and to identify which error types appear to be criterial features in determining interlanguage stages. Results show that specific errors such as punctuation, spelling and verb tense are significant at specific CEFR levels.

show abstract

Section: Conclusion and Future Researchmentioning

confidence: 99%

A Supervised Learning Model for the Automatic Assessment of Language Levels Based on Learner Errors

Ballier

Gaillat

Simpkin

et al. 2019

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…We collect data from the EF Cambridge Open Language Database (EFCamDat), a longitudinal corpus consisting of 1.8 million English learner texts submitted by 174,000 students enrolled in a virtual learning environment (Huang et al, 2017). The corpus spans sixteen proficiency levels aligned with standards such as the Common European Framework of Reference for languages.…”

Section: Corpusmentioning

confidence: 99%

“…The texts include metadata on the learner's proficiency level and nationality acts as a proxy to native language background. The texts are annotated with part of speech (POS) tags using the Penn Treebank Tagset, grammatical relationships using Syn-taxNet, and some texts in the corpus (66%) include error annotations provided by teachers using predetermined error markup tags (Huang et al, 2017).…”

Section: Corpusmentioning

confidence: 99%

Using Classifier Features to Determine Language Transfer on Morphemes

Lavrentovich¹

2018

Proceedings of the 2018 Conference of the North American Chapter Of the Association for Computational Linguistics: St

View full text Add to dashboard Cite

The aim of this thesis is to perform a Native Language Identification (NLI) task where we identify an English learner's native language background based only on the learner's English writing samples. We focus on the use of English grammatical morphemes across four proficiency levels. The outcome of the computational task is connected to a position in second language acquisition research that holds all learners acquire English grammatical morphemes in the same order, regardless of native language background. We use the NLI task as a tool to uncover cross-linguistic influence on the developmental trajectory of morphemes. We perform a cross-corpus evaluation across proficiency levels to increase the reliability and validity of the linguistic features that predict the native language background. We include native English data to determine the different morpheme patterns used by native versus non-native English speakers. Furthermore, we conduct a human NLI task to determine the type and magnitude of language transfer cues used by human judges versus the classifier.

show abstract

“…The solution: a case study The original database. The EF Cambridge Open Language Database (EFCAMDAT) is the largest open-access L2 English learner database, containing ~1,180,000 texts written by ~174,000 learners from various nationalities (Geertzen et al, 2013;Huang et al, 2017Huang et al, , 2018. It contains data that was gathered from an online educational platform, which was run in many different countries over the course of years.…”

mentioning

confidence: 99%

Working with Data from Real-World Corpora: A Case Study on Identifying Issues and Using Scalable Solutions

Shatz

2020

Preprint

View full text Add to dashboard Cite

The opportunity: new large-scale datasets Language corpora are increasingly being based on data from social and educational online platforms, and the size of these new corpora allows researchers to analyze language use in ways that were not previously possible (Callies, 2015; McEnery et al., 2019). The challenge: organize and clean large amounts of data Online platforms generally do not collect data with linguistic research in mind, so their data is often "messy" or "dirty" in various ways. Researchers must therefore develop new approaches for organizing and cleaning this data. Such approaches should generally be scalable, due to the size of these datasets, so they should rely primarily on quantitative and NLP-based techniques.

show abstract

Dependency parsing of learner English

Cited by 56 publications

References 16 publications

A Supervised Learning Model for the Automatic Assessment of Language Levels Based on Learner Errors

A Supervised Learning Model for the Automatic Assessment of Language Levels Based on Learner Errors

Using Classifier Features to Determine Language Transfer on Morphemes

Working with Data from Real-World Corpora: A Case Study on Identifying Issues and Using Scalable Solutions

Contact Info

Product

Resources

About