A Neural Network Architecture for Detecting Grammatical Errors in Statistical Machine Translation

Tezcan, Arda; Hoste, Véronique; Macken, Lieve

doi:10.1515/pralin-2017-0015

Cited by 10 publications

(2 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The work is further extended in [2], where probabilistic parsing features are incorporated with the POS n-grams and XLE-based features to improve the results. In [6], the authors propose a classifier to detect grammatical mistakes in the output produced by Statistical Machine Translation (SMT) systems. The structure of the sentences has been captured using multi-hot encoding where the word vector represents three types of information: POS tag, morphology and dependency relation.…”

Section: Background Studymentioning

confidence: 99%

See 1 more Smart Citation

Lex-Pos Feature-Based Grammar Error Detection System for the English Language

2020

View full text Add to dashboard Cite

This work focuses on designing a grammar detection system that understands both structural and contextual information of sentences for validating whether the English sentences are grammatically correct. Most existing systems model a grammar detector by translating the sentences into sequences of either words appearing in the sentences or syntactic tags holding the grammar knowledge of the sentences. In this paper, we show that both these sequencing approaches have limitations. The former model is over specific, whereas the latter model is over generalized, which in turn affects the performance of the grammar classifier. Therefore, the paper proposes a new sequencing approach that contains both information, linguistic as well as syntactic, of a sentence. We call this sequence a Lex-Pos sequence. The main objective of the paper is to demonstrate that the proposed Lex-Pos sequence has the potential to imbibe the specific nature of the linguistic words (i.e., lexicals) and generic structural characteristics of a sentence via Part-Of-Speech (POS) tags, and so, can lead to a significant improvement in detecting grammar errors. Furthermore, the paper proposes a new vector representation technique, Word Embedding One-Hot Encoding (WEOE) to transform this Lex-Pos into mathematical values. The paper also introduces a new error induction technique to artificially generate the POS tag specific incorrect sentences for training. The classifier is trained using two corpora of incorrect sentences, one with general errors and another with POS tag specific errors. Long Short-Term Memory (LSTM) neural network architecture has been employed to build the grammar classifier. The study conducts nine experiments to validate the strength of the Lex-Pos sequences. The Lex-Pos -based models are observed as superior in two ways: (1) they give more accurate predictions; and (2) they are more stable as lesser accuracy drops have been recorded from training to testing. To further prove the potential of the proposed Lex-Pos -based model, we compare it with some well known existing studies.

show abstract

Section: Background Studymentioning

confidence: 99%

“…For example, the sentence "I am reading a book" will be transformed into the sequence < I > <am> <reading> <a> <book>. In the second approach, a sentence is converted into the sequence of tokens which indicate its structural or syntactic information [6,7]. We call these types of sequences syntactic.…”

Section: Introductionmentioning

confidence: 99%