2016 International Conference on Asian Language Processing (IALP) 2016
DOI: 10.1109/ialp.2016.7875980
|View full text |Cite
|
Sign up to set email alerts
|

Developing learner corpus annotation for Chinese grammatical errors

Abstract: This study describes the construction of a TOCFL learner corpus and its usage for Chinese grammatical error diagnosis. We collected essays from the Test Of Chinese as a Foreign Language (TOCFL) and annotated grammatical errors using hierarchical tagging sets. Two kinds of error classifications were used simultaneously to tag grammatical errors. The first capital letter of each error tags denotes the coarse-grained surface differences, while the subsequent lowercase letters denote the fine-grained linguistic ca… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
10
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
2
2

Relationship

1
8

Authors

Journals

citations
Cited by 15 publications
(10 citation statements)
references
References 22 publications
0
10
0
Order By: Relevance
“…The learner corpora used in our shared task were taken from two sources: the writing section of the computer-based Test Of Chinese as a Foreign Language (TOCFL) (Lee et al, 2016) and the writing section of the Hanyu Shuiping Kaoshi(HSK, Test of Chinese Level) (Cui et al, 2011;Zhang et al, 2013). Native Chinese speakers were trained to manually annotate grammatical errors and provide corrections corresponding to each error.…”
Section: Datasetsmentioning
confidence: 99%
“…The learner corpora used in our shared task were taken from two sources: the writing section of the computer-based Test Of Chinese as a Foreign Language (TOCFL) (Lee et al, 2016) and the writing section of the Hanyu Shuiping Kaoshi(HSK, Test of Chinese Level) (Cui et al, 2011;Zhang et al, 2013). Native Chinese speakers were trained to manually annotate grammatical errors and provide corrections corresponding to each error.…”
Section: Datasetsmentioning
confidence: 99%
“…Concerning language learning, there have been several research efforts that present the error diagnosis process can diagnose, among others, grammatical, syntactic, vocabulary mistakes by using techniques, such as approximate string matching, convolutional sequence to sequence modeling, context representation, etc. [13][14][15][16][17][18][19]. For example, the work of [19] proposes a sequence-to-sequence learning approach using recurrent neural networks for conducting error analysis and diagnosis.…”
Section: Introductionmentioning
confidence: 99%
“…In the work of [14], the authors used the Clause Complex model to analyze the learners' errors emerging from grammatical differences in language learning. The work of [15] proposes a framework of hierarchical tagging sets to perform annotation of grammatical mistakes in language learning. Finally, the authors of [16] performed classification on spelling mistakes in two categories, i.e., orthographic and phonological errors.…”
Section: Introductionmentioning
confidence: 99%
“…For over a decade, user generated content (UGC) has been an important target of NLP technology. It is characterized by phenomena not found in standard texts, such as word lengthening (Brody and Diakopoulos, 2011), dialectal variations (Saito et al, 2017;Blodgett et al, 2016), unknown onomatopoeias (Sasano et al, 2013), grammatical errors (Mizumoto et al, 2011;Lee et al, 2018), and mother tongue interference in non-native writing (Goldin et al, 2018). Typographical errors (typos) also occur often in UGC.…”
Section: Introductionmentioning
confidence: 99%