Summary.Learner corpora are gaining popularity in the Baltic States as well as elsewhere in the world. The aim of the article is to discuss what kinds of annotation have been used in learner corpus research in Latvia and Lithuania so far and to describe which ones of them would be most suitable for the newly created learner corpus of the second Baltic languageEsam. A lot of learner corpus research in Latvia and Lithuania is undertaken without any annotation. The most common types of annotation are the ones based on the theory of levels of language -morphological and syntactic annotation. There is little collaboration between researchers of neighbour countries, but linguists of each country collaborate closely with each other using similar annotation schemes and creating corpora that are comparable in some aspects. The learner corpus of the second Baltic language should try to fit in the picture to some extent. Part of speech annotation and simple syntactic annotation could help in that. However, things that have not yet become so popular in learner corpus research in this region could also be useful. Therefore, error annotation and lemmatization have been chosen to be included in the annotation plan of the corpus Esam as well.
Errors in language learning are seen as normal and even necessary. However, researching them is often undermined by the need for a clear definition what is or should be considered an error and by the lack of an error taxonomy. This paper shortly discusses the notion of error in various contexts, especially in learner corpora research. Then it offers an error taxonomy that was created for error-tagging a learner corpus of Baltic languages. The aim of the study is to create a taxonomy that is suitable for annotating beginner texts of Latvian and Lithuanian, and efficient in use. The taxonomy is based on the previous work of S. Granger who identified error types for a learner corpus of French. These error types are reviewed, modified and/or replaced where necessary in order to match the structure of Latvian and Lithuanian languages. 5 error types (form; morphology and word-formation; syntax; vocabulary; punctuation) with 29 subtypes are distinguished. Those are described in the article along with examples from the corpus. The taxonomy is now being used for annotation the learner corpus of the second Baltic language which provides researchers with valuable material on language learning outcomes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.