The paper presents an overview of the development and research in Lithuanian language technologies for the period 2016–2020. The most significant national and international LT related initiatives, projects, research infrastructures, language resources and tools are discussed. The paper also surveys research production in the field of language technology for the Lithuanian language. The provided analysis of scientific papers shows that machine translation and speech technologies were the most trending research topics in 2016–2019.
Due to the fast pace of life and online communications and the prevalence of English and the QWERTY keyboard, people tend to forgo using diacritics, make typographical errors (typos) when typing in other languages. Restoring diacritics and correcting spelling is important for proper language use and the disambiguation of texts for both humans and downstream algorithms. However, both of these problems are typically addressed separately: the state-of-the-art diacritics restoration methods do not tolerate other typos, but classical spellcheckers also cannot deal adequately with all the diacritics missing.In this work, we tackle both problems at once by employing the newly-developed universal ByT5 byte-level seq2seq transformer model that requires no language-specific model structures. For a comparison, we perform diacritics restoration on benchmark datasets of 12 languages, with the addition of Lithuanian. The experimental investigation proves that our approach is able to achieve results (>98%) comparable to the previous state-of-the-art, despite being trained less and on fewer data. Our approach is also able to restore diacritics in words not seen during training with >76% accuracy. Our simultaneous diacritics restoration and typos correction approach reaches >94% alpha-word accuracy on the 13 languages. It has no direct competitors and strongly outperforms classical spell-checking or dictionary-based approaches. We also demonstrate all the accuracies to further improve with more training. Taken together, this shows the great real-world application potential of our suggested methods to more data, languages, and error classes.
The securitization of the COVID-19 pandemic allowed governments in democratic countries to introduce extraordinary management measures that involved limiting various human rights. However, sound democratic governance always requires public debate on any policies introduced. These debates occur in multiple arenas and the parliament is among the most notable. In the context of human rights, some studies identified parliament as one of the most important agencies that promote human rights protection and oversee executive authorities (Lyer, 2019; Ncube, 2020). This article examines whether and how Lithuanian parliamentarians and government members addressed human rights during the Seimas debates when issues related to the COVID-19 pandemic were discussed. It investigates whether the Seimas could be considered an important agent contributing to the oversight of human rights in Lithuania. The article employs transcripts from the Seimas plenary debates as a data source, particularly speeches from the government question time from 2020.03 to 2021.01. The results of the qualitative thematic analysis revealed that human rights were generally not the main topic of the COVID-19 pandemic debates on the Seimas floor during government hours. It also showed that the attitudes of political parties toward specific human rights tended to shift when they switched from the opposition to the ruling majority and vice versa.
The language technology bibliography for Lithuanian language in the period 2016-2020. The resource is in BibTex format and it contains: 1) 91 references of research publications, 2) 15 references of documents and strategies, and 3) 26 references of language resources and tools. The resource is used for the paper: Utka, Andrius, Jurgita Vaičenonienė, Monika Briedienė and Tomas Krilavičius. 2020. Development and Research in Lithuanian Language Technologies (2016-2020). In Human language technologies - the Baltic perspective : proceedings of the 9th international conference, Baltic HLT 2020, Kaunas (Lithuania). Amsterdam : IOS Press.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.