Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP) 2014
DOI: 10.3115/v1/w14-3622
|View full text |Cite
|
Sign up to set email alerts
|

The Columbia System in the QALB-2014 Shared Task on Arabic Error Correction

Abstract: The QALB-2014 shared task focuses on correcting errors in texts written in Modern Standard Arabic. In this paper, we describe the Columbia University entry in the shared task. Our system consists of several components that rely on machinelearning techniques and linguistic knowledge. We submitted three versions of the system: these share several core elements but each version also includes additional components. We describe our underlying approach and the special aspects of the different versions of our submiss… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
24
0

Year Published

2014
2014
2023
2023

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 25 publications
(25 citation statements)
references
References 17 publications
1
24
0
Order By: Relevance
“…Since the errors are sparse, this feature causes the model to abstain from flagging mistakes, resulting in low recall. To avoid this problem, we adopt the approach proposed in Rozovskaya et al (2012), the error inflation method, and add artificial article errors to the training data based on the error distribution on the training set. This method prevents the source feature from dominating the context features, and improves the recall of the system.…”
Section: The Baseline Systemmentioning
confidence: 99%
See 1 more Smart Citation
“…Since the errors are sparse, this feature causes the model to abstain from flagging mistakes, resulting in low recall. To avoid this problem, we adopt the approach proposed in Rozovskaya et al (2012), the error inflation method, and add artificial article errors to the training data based on the error distribution on the training set. This method prevents the source feature from dominating the context features, and improves the recall of the system.…”
Section: The Baseline Systemmentioning
confidence: 99%
“…The article classifier is a discriminative model that draws on the state-of-the-art approach described in Rozovskaya et al (2012). The model makes use of the Averaged Perceptron (AP) algorithm (Freund and Schapire, 1996) and is trained on the training data of the shared task with rich features.…”
Section: The Baseline Systemmentioning
confidence: 99%
“…We carefully examine the factors involved in a wide range of features that have been or can be used to the word label classification task. Many features that are considered effective in various of previous works Rozovskaya et al, 2012;Han et al, 2006;Rozovskaya et al, 2011;Tetreault, Joel R and Chodorow, Martin, 2008) are included. Besides, features that are used in the similar spell checking tasks (Jia et al, 2013b;Yang et al, 2012) and some novel features showing effectiveness in other NLP tasks Xu and Zhao, 2012;Ma and Zhao, 2012;Zhao, 2009;Zhao et al, 2009b) are also included.…”
Section: Feature Selection and Generationmentioning
confidence: 99%
“…For example, the correction of Vform (verb form) error type includes all verb form inflections such as converting a verb to its infinitive form, gerund form, past form and past participle and so on. Previous works Rozovskaya et al, 2012;Kochmar et al, 2012) manually decompose each error types to more detailed subtypes. For example, in , the determinater errors are decomposed into:…”
Section: Data Labelingmentioning
confidence: 99%
“…We carefully examine the factors involved in a wide range of features that have been or can be used to the word label classification task. Many features that are considered effective in various of previous works Rozovskaya et al, 2012;Han et al, 2006;Rozovskaya et al, 2011;Tetreault, Joel R and Chodorow, Martin, 2008) are included. Besides, features that are used in the similar spell checking tasks (Jia et al, 2013b;Yang et al, 2012) and some novel features showing effectiveness in other NLP tasks Zhang and Zhao, 2013;Xu and Zhao, 2012;Ma and Zhao, 2012;Zhao, 2009;Zhao et al, 2009b) are also included.…”
Section: Feature Selection and Generationmentioning
confidence: 99%