LVBERT: Transformer-Based Model for Latvian Language Understanding

Znotiņš, Artūrs; Bārzdiņš, Guntis

doi:10.3233/faia200610

Cited by 11 publications

(9 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Skolintų vardų baigmenys, tampantys naujų vardų formantais, patvirtina Lietuvos vardyno atvirumą inovacijoms. kas tika aizstāts ar jomai pielāgotu nosaukto entitāšu atpazinēju (Znotiņš & Bārzdiņš, 2020), tādējādi uzlabojot nosaukto entitāšu atpazīšanas precizitāti. Lai nodrošinātu sekmīgu informācijas izguvi no zinību bāzes, lietotāja izteikumā identificētās entitātes tiek pārveidotas pamatformā, kā arī dialogsistēmā ietverts komponents sinonīmisku entitāšu lietojumu atpazīšanai un normalizēšanai.…”

Section: What Human Genetics Can Give Linguistics a Case Of Region Around Baltics -Germanic Baltic And Finno-ugric Languagesunclassified

XIII Starptautiskais baltistu kongress “Baltu valodas laikā un telpā” : referātu tēzes

2021

View full text Add to dashboard Cite

show abstract

Section: What Human Genetics Can Give Linguistics a Case Of Region Around Baltics -Germanic Baltic And Finno-ugric Languagesunclassified

XIII Starptautiskais baltistu kongress “Baltu valodas laikā un telpā” : referātu tēzes

2021

View full text Add to dashboard Cite

show abstract

“…The neural network model is based on Latvian BERT word embeddings (Znotins and Barzdins, 2020). To support entity classes of a particular domain, the NER model is trained on a larger generaldomain dataset (Gruzitis et al, 2018;Paikens et al, 2020) and a smaller domain-specific dataset.…”

Section: Entity Databasementioning

confidence: 99%

“…The tool has been successfully used for the creation of a dialogue dataset for the virtual assistant that supports the work of the student service in relation to three frequently asked topics: working hours and contacts of the personnel and structural units (e.g. libraries), issues regarding academic leave, as well as enrollment requirements and documents to be submitted (Skadina and Gosko, 2020).…”

Section: Introductionmentioning

confidence: 99%

Domain Expert Platform for Goal-Oriented Dialog Collection

Gosko

Znotiņš

Skadiņa

et al. 2021

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrati

View full text Add to dashboard Cite

Today, most dialogue systems are fully or partly built using neural network architectures. A crucial prerequisite for the creation of a goaloriented neural network dialogue system is a dataset that represents typical dialogue scenarios and includes various semantic annotations, e.g. intents, slots and dialogue actions, that are necessary for training a particular neural network architecture. In this demonstration paper, we present an easy to use interface and its backend which is oriented to domain experts for the collection of goal-oriented dialogue samples. The platform not only allows to collect or write sample dialogues in a structured way, but also provides a means for simple annotation and interpretation of the dialogues. The platform itself is language-independent; it depends only on the availability of particular language processing components for a specific language. It is currently being used to collect dialogue samples in Latvian (a highly inflected language) which represent typical communication between students and the student service.

show abstract

“…The LVBERT model [5] yields slightly worse results for POS tagging (98.1% accuracy) and NER (82.6 % accuracy) tasks than the best embeddings in [4]. Likewise, the Latvian BERT model in [6]applied to NER task achieved an F1 score of Applied Computer Systems _________________________________________________________________________________________________2021/26 133 81.91 % while using a significantly smaller training corpus than in [4].…”

Section: Related Workmentioning

confidence: 99%

Evaluation of Word Embedding Models in Latvian NLP Tasks Based on Publicly Available Corpora

Laucis

Jēkabsons

2021

Applied Computer Systems

View full text Add to dashboard Cite

Nowadays, natural language processing (NLP) is increasingly relaying on pre-trained word embeddings for use in various tasks. However, there is little research devoted to Latvian – a language that is much more morphologically complex than English. In this study, several experiments were carried out in three NLP tasks on four different methods of creating word embeddings: word2vec, fastText, Structured Skip-Gram and ngram2vec. The obtained results can serve as a baseline for future research on the Latvian language in NLP. The main conclusions are the following: First, in the part-of-speech task, using a training corpus 46 times smaller than in a previous study, the accuracy was 91.4 % (versus 98.3 % in the previous study). Second, fastText demonstrated the overall best effectiveness. Third, the best results for all methods were observed for embeddings with a dimension size of 200. Finally, word lemmatization generally did not improve results.

show abstract

LVBERT: Transformer-Based Model for Latvian Language Understanding

Cited by 11 publications

References 4 publications

XIII Starptautiskais baltistu kongress “Baltu valodas laikā un telpā” : referātu tēzes

XIII Starptautiskais baltistu kongress “Baltu valodas laikā un telpā” : referātu tēzes

Domain Expert Platform for Goal-Oriented Dialog Collection

Evaluation of Word Embedding Models in Latvian NLP Tasks Based on Publicly Available Corpora

Contact Info

Product

Resources

About