The Latvian Language Learners Corpus (LaVA) developed at the Institute of Mathematics and Computer Science, University of Latvia, includes more than 1000 texts created by foreign Latvian language learners studying at Latvian higher education institutions for the first or second semester reaching A1 (possibly A2) Latvian language proficiency level. The size of the corpus is more than 180 000 words. The morphologically annotated texts have been checked manually; the language learners' errors have been manually annotated. In addition, each text is accompanied by information about the author of the text (metadata): gender, age, native language, knowledge of other languages. When analysing the data, this information can be used to determine how the learner's mother tongue and language skills, in general, affect the acquisition of the Latvian language. Users of the corpus can analyse the data both on the LaVA website (see http://lava.korpuss.lv/search) and in the SketchEngine tool, where the quantitative and qualitative analysis of the data can be performed. The quantitative approach makes it possible to find out the tendencies of the use of a word, word form, or construction and allows to determine the frequency of mistakes made by language learners. In addition, the objectivity of the research is ensured by looking at the data of language learners from different aspects and performing repeated analysis. For example, by statistically analysing the nouns used in learners' texts, it can be concluded that declension 4 nouns are most often used. The next in terms of frequency of use are declension 1, 5 and 2 nouns, while declension 3 and 6 nouns and indeclinable nouns are used very rarely. Qualitative analysis reveals certain features of morphology and word formation, including aspects of syntax, based on empirical data. It is possible to qualitatively analyse the erroneous use of nouns, verbs, or other parts of speech, trying to understand what rules determine this. For example, consider using non-reflexive verbs instead of reflexive verbs, using infinitives instead of finite forms (person forms), using a suffix that does not fit the noun paradigm, etc. According to LaVA data analysis, including learners error analysis, exercises and tests are generated. The exercises are intended to help the language learner to strengthen the linguistic competence of the Latvian language, for example, the use of verb forms in the indicative mood, both in indefinite and perfect tense forms. Exercise creation consists of three stages: (1) analysis of LaVA errors and identification of typical errors, (2) Collecting of sample sentences from various corpora of the Latvian language, for example, LVK2018, Saeima, with word forms and constructions in which language learners most often make mistakes in LaVA texts, (3) generation of different exercises using the selected sample sentences.
Apguvēju korpuss ir sistemātiski datorizētu valodas apguvēju (gan svešvalodas, gan otrās valodas) veidotu tekstu datubāze. Tas ir ārvalstnieku valodas apguvēju īpatnību izpētes un datos balstītu latviešu valodas mācību materiālu un metodisko līdzekļu izstrādes pamats. Apguvēju korpusu, tāpat kā citus valodas korpusus, var marķēt dažādos valodas līmeņos (morfoloģiski, sintaktiski), bet īpaši nozīmīgs apguvēju valodas izpētē ir kļūdu marķējums un tajā balstītā kļūdu analīze. Kļūdu analīzi ietekmē divi faktori: 1) izraudzītie kļūdu tipi jeb kļūdu tipoloģija un 2) izvirzītās mērķhipotēzes, t. i., labotais teksts. Tādēļ pirms kļūdu marķēšanas ir būtiski vienoties, kas tiks marķēts un kā tas tiks darīts. Raksta ievadā ir īsi raksturots veidojamais „Latviešu valodas apguvēju korpuss” (LaVA), aplūkots mērķhipotēzes jēdziens un mērķhipotēzes nozīme valodas apguvēju korpusa izveides procesā. Rakstā ir izklāstīti galvenie mērķhipotēzes izvirzīšanas principi korpusā LaVA, kā arī minēti konkrēti piemēri, kā valodas apguvēju izteikumi tiek laboti atbilstoši latviešu valodas normām un kādas ir būtiskākās atkāpes, kas tiek pieļautas.
The paper discusses the various meanings of the words būt and būti ‘to be’ in Latvian and Lithuanian and their acquisition. The aim of the research is to discover in what meanings these words are used by the learners of the second Baltic language, i.e., the learners whose mother tongue is a Baltic language and who are learning the other Baltic language – Latvian for Lithuanians and Lithuanian for Latvians. Both languages are closely related, and the meanings are similar. However, their grouping in each linguistics tradition is different. Therefore, this paper attempts to group meanings from both languages into a single system that could be used to compare usage tendencies. Such a system was made based on the Latvian grouping tradition but is not concise. The meanings in which būt/būti is used may depend on the topics of the texts. There seems to be an overuse of the location meaning in texts written by learners of Latvian; learners of Lithuanian seem to be using the verb in a larger variety of meanings. Perfect times are also more present in texts written by learners of Lithuanian. It seems to hint at a systemic difference between Baltic languages: Latvian tends to use perfect times in situations when Lithuanian does not.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.