The Columbia System in the QALB-2014 Shared Task on Arabic Error Correction

Rozovskaya, Alla; Habash, Nizar; Eskander, Ramy; Farra, Noura; Salloum, Wael

doi:10.3115/v1/w14-3622

Cited by 25 publications

(25 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Since the errors are sparse, this feature causes the model to abstain from flagging mistakes, resulting in low recall. To avoid this problem, we adopt the approach proposed in Rozovskaya et al (2012), the error inflation method, and add artificial article errors to the training data based on the error distribution on the training set. This method prevents the source feature from dominating the context features, and improves the recall of the system.…”

Section: The Baseline Systemmentioning

confidence: 99%

See 1 more Smart Citation

The Illinois-Columbia System in the CoNLL-2014 Shared Task

Rozovskaya¹,

Chang²,

Sammons³

et al. 2014

Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task

Self Cite

View full text Add to dashboard Cite

The CoNLL-2014 shared task is an extension of last year's shared task and focuses on correcting grammatical errors in essays written by non-native learners of English. In this paper, we describe the Illinois-Columbia system that participated in the shared task. Our system ranked second on the original annotations and first on the revised annotations.The core of the system is based on the University of Illinois model that placed first in the CoNLL-2013 shared task. This baseline model has been improved and expanded for this year's competition in several respects. We describe our underlying approach, which relates to our previous work, and describe the novel aspects of the system in more detail.

show abstract

Section: The Baseline Systemmentioning

confidence: 99%

“…The article classifier is a discriminative model that draws on the state-of-the-art approach described in Rozovskaya et al (2012). The model makes use of the Averaged Perceptron (AP) algorithm (Freund and Schapire, 1996) and is trained on the training data of the shared task with rich features.…”

Section: The Baseline Systemmentioning

confidence: 99%

The Illinois-Columbia System in the CoNLL-2014 Shared Task

Rozovskaya¹,

Chang²,

Sammons³

et al. 2014

Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task

Self Cite

View full text Add to dashboard Cite

show abstract

“…We carefully examine the factors involved in a wide range of features that have been or can be used to the word label classification task. Many features that are considered effective in various of previous works Rozovskaya et al, 2012;Han et al, 2006;Rozovskaya et al, 2011;Tetreault, Joel R and Chodorow, Martin, 2008) are included. Besides, features that are used in the similar spell checking tasks (Jia et al, 2013b;Yang et al, 2012) and some novel features showing effectiveness in other NLP tasks Xu and Zhao, 2012;Ma and Zhao, 2012;Zhao, 2009;Zhao et al, 2009b) are also included.…”

Section: Feature Selection and Generationmentioning

confidence: 99%

“…For example, the correction of Vform (verb form) error type includes all verb form inflections such as converting a verb to its infinitive form, gerund form, past form and past participle and so on. Previous works Rozovskaya et al, 2012;Kochmar et al, 2012) manually decompose each error types to more detailed subtypes. For example, in , the determinater errors are decomposed into:…”

Section: Data Labelingmentioning

confidence: 99%

Grammatical Error Detection and Correction using a Single Maximum Entropy Model

Wang

Jia²,

Zhao

2014

Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task

View full text Add to dashboard Cite

This paper describes the system of Shanghai Jiao Tong Unvierity team in the CoNLL-2014 shared task. Error correction operations are encoded as a group of predefined labels and therefore the task is formulized as a multi-label classification task. For training, labels are obtained through a strict rule-based approach. For decoding, errors are detected and corrected according to the classification results. A single maximum entropy model is used for the classification implementation incorporated with an improved feature selection algorithm. Our system achieved precision of 29.83, recall of 5.16 and F 0.5 of 15.24 in the official evaluation.

show abstract

“…We carefully examine the factors involved in a wide range of features that have been or can be used to the word label classification task. Many features that are considered effective in various of previous works Rozovskaya et al, 2012;Han et al, 2006;Rozovskaya et al, 2011;Tetreault, Joel R and Chodorow, Martin, 2008) are included. Besides, features that are used in the similar spell checking tasks (Jia et al, 2013b;Yang et al, 2012) and some novel features showing effectiveness in other NLP tasks Zhang and Zhao, 2013;Xu and Zhao, 2012;Ma and Zhao, 2012;Zhao, 2009;Zhao et al, 2009b) are also included.…”

Section: Feature Selection and Generationmentioning

confidence: 99%

Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task

Junczys-Dowmunt¹,

Grundkiewicz²

2014

View full text Add to dashboard Cite

IntroductionThis volume contains papers describing the CoNLL-2014 Shared Task and the participating systems. This year, we continue the tradition of the Conference on Computational Natural Language Learning (CoNLL) of having a high profile shared task in natural language processing, centered on automatic grammatical error correction of English essays. The grammatical error correction task is impactful since it is estimated that hundreds of millions of people in the world are learning English as a second language, and they benefit directly from an automated grammar checker.This task is a continuation of the CoNLL shared task in 2013. We have only one track in which shared task participants are provided with an annotated training corpus, but are allowed to use additional resources as long as they are publicly available. The training corpus, NUCLE (NUS Corpus of Learner English), is a large collection of English essays written by students at the National University of Singapore (NUS) who are non-native speakers of English. The essays were annotated by professional English instructors at the NUS. As in other shared tasks, we provide a common test set with gold-standard annotations, and a scorer to evaluate the submitted system output. This year's shared task requires a participating system to correct all error types present in an essay, instead of only the five error types in the CoNLL-2013 shared task. Also, the evaluation metric has been changed to F 0.5 , weighting precision twice as much as recall.A total of 13 participating teams submitted system output and 12 of them submitted system description papers. Many different approaches were adopted to perform grammatical error correction. We hope that these approaches help to advance the state of the art in grammatical error correction, and that the test set and scorer, which are freely available after the shared task, can be useful resources for those interested in grammatical error correction. AbstractThe CoNLL-2014 shared task was devoted to grammatical error correction of all error types. In this paper, we give the task definition, present the data sets, and describe the evaluation metric and scorer used in the shared task. We also give an overview of the various approaches adopted by the participating teams, and present the evaluation results. Compared to the CoNLL-2013 shared task, we have introduced the following changes in CoNLL-2014: (1) A participating system is expected to detect and correct grammatical errors of all types, instead of just the five error types in CoNLL-2013; (2) The evaluation metric was changed from F 1 to F 0.5 , to emphasize precision over recall; and (3) We have two human annotators who independently annotated the test essays, compared to just one human annotator in CoNLL-2013.

show abstract

The Columbia System in the QALB-2014 Shared Task on Arabic Error Correction

Cited by 25 publications

References 17 publications

The Illinois-Columbia System in the CoNLL-2014 Shared Task

The Illinois-Columbia System in the CoNLL-2014 Shared Task

Grammatical Error Detection and Correction using a Single Maximum Entropy Model

Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task

Contact Info

Product

Resources

About