2018
DOI: 10.1075/ijcl.16080.hua
|View full text |Cite
|
Sign up to set email alerts
|

Dependency parsing of learner English

Abstract: Current syntactic annotation of large-scale learner corpora mainly resorts to “standard parsers” trained on native language data. Understanding how these parsers perform on learner data is important for downstream research and application related to learner language. This study evaluates the performance of multiple standard probabilistic parsers on learner English. Our contributions are three-fold. Firstly, we demonstrate that the common practice of constructing a gold standard – by manually correcting the pre… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

1
24
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
3
2
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 56 publications
(25 citation statements)
references
References 16 publications
1
24
0
Order By: Relevance
“…Similarly, morphosyntactic error types may be identified by using POS patterns. [14] reports the robustness of parsers when analysing learner data, and that dependency parsing is more sensitive to errors than PoS-tagging. Conversely, error types such as verb tense remain challenging in terms of implementation due to the semantic value of contexts.…”
Section: Conclusion and Future Researchmentioning
confidence: 99%
“…Similarly, morphosyntactic error types may be identified by using POS patterns. [14] reports the robustness of parsers when analysing learner data, and that dependency parsing is more sensitive to errors than PoS-tagging. Conversely, error types such as verb tense remain challenging in terms of implementation due to the semantic value of contexts.…”
Section: Conclusion and Future Researchmentioning
confidence: 99%
“…We collect data from the EF Cambridge Open Language Database (EFCamDat), a longitudinal corpus consisting of 1.8 million English learner texts submitted by 174,000 students enrolled in a virtual learning environment (Huang et al, 2017). The corpus spans sixteen proficiency levels aligned with standards such as the Common European Framework of Reference for languages.…”
Section: Corpusmentioning
confidence: 99%
“…The texts include metadata on the learner's proficiency level and nationality acts as a proxy to native language background. The texts are annotated with part of speech (POS) tags using the Penn Treebank Tagset, grammatical relationships using Syn-taxNet, and some texts in the corpus (66%) include error annotations provided by teachers using predetermined error markup tags (Huang et al, 2017).…”
Section: Corpusmentioning
confidence: 99%
“…The solution: a case study The original database. The EF Cambridge Open Language Database (EFCAMDAT) is the largest open-access L2 English learner database, containing ~1,180,000 texts written by ~174,000 learners from various nationalities (Geertzen et al, 2013;Huang et al, 2017Huang et al, , 2018. It contains data that was gathered from an online educational platform, which was run in many different countries over the course of years.…”
mentioning
confidence: 99%