While a wide variety of grammatical mistakes may be observed in the speech of non-native speakers, the types and frequencies of these mistakes are not random. Certain parts of speech, for example, have been shown to be especially problematic for Japanese learners of English [1]. Modelling these errors can potentially enhance the performance of computer-assisted language learning systems.This paper presents an automatic method to estimate an error model from a non-native English corpus, focusing on articles and prepositions. A fine-grained analysis is achieved by conditioning the errors on appropriate words in the context.
Learner corpora consist of texts produced by non-native speakers. In addition to these texts, some learner corpora also contain error annotations, which can reveal common errors made by language learners, and provide training material for automatic error correction. We present a novel type of error-annotated learner corpus containing sequences of revised essay drafts written by non-native speakers of English. Sentences in these drafts are annotated with comments by language tutors, and are aligned to sentences in subsequent drafts. We describe the compilation process of our corpus, present its encoding in TEI XML, and report agreement levels on the error annotations. Further, we demonstrate the potential of the corpus to facilitate research on textual revision in L2 writing, by conducting a case study on verb tenses using ANNIS, a corpus search and visualization platform.
Abstract. This paper is concerned with the task of preposition generation in the context of a grammar checker. Relevant features for this task can range from lexical features, such as words and their part-ofspeech tags in the vicinity of the preposition, to syntactic features that take into account the attachment site of the prepositional phrase (PP), as well as its argument/adjunct distinction. We compare the performance of these different kinds of features in a memory-based learning framework. Experiments show that using PP attachment information can improve preposition generation accuracy on Wall Street Journal texts.
This paper concerns our recent research in developing high-quality spoken language translation for restricted domains. The intended application is a spoken-language translation aid for a student of a foreign language. A significant novelty of the work is in leveraging an existing English-toMandarin translation system in the weather domain both to provide a corpus of sentence pairs for training and to induce an initial version of the parsing grammar for translation in the reverse direction. Using an interlingual approach, we are able to reject strings that fail to parse, yielding high accuracy on any translations provided to the student. On a test set of 369 naturally spoken Mandarin queries, the translation was judged incorrect for fewer than 3% of the query transcripts. A statistical phrase-based translation system performed significantly worse, when trained on the same material.
This article presents the first study on using a parallel corpus to teach Cantonese, the variety of Chinese spoken in Hong Kong. We evaluated this approach with Mandarin-speaking undergraduate students at the beginner level. Exploiting their knowledge of Mandarin, a closely related language, the students studied Cantonese with authentic material in a Cantonese-Mandarin parallel corpus, transcribed from television programs. They were given a list of Mandarin words that yield a range of possible Cantonese translations, depending on the linguistic context. Leveraging sentence and word alignments in the parallel corpus, the students independently searched for example sentences to discover these translation equivalents. Experimental results showed that, in both the short- and long-term, this data-driven learning approach helped students improve their knowledge of Cantonese vocabulary. These results suggest the potential of applying parallel corpora at even the beginners’ level for other L1-L2 pairs of closely related languages.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.