2019
DOI: 10.48550/arxiv.1912.07076
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Multilingual is not enough: BERT for Finnish

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
113
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 75 publications
(113 citation statements)
references
References 0 publications
0
113
0
Order By: Relevance
“…These models were found to perform well on tasks involving code-mixed text too [1,18]. Given that these models have to support over 100 languages with the limited model capacity they have, some works have found that they are outperformed by monolingual versions, even on some low-resourced languages [8,25,27,31]. Some of the aforementioned works [26,33,34] have theorized about the reasons why crosslingual transfer works and have stated that factors like pretraining data size and vocabulary overlap between languages could affect the transfer performance of a language in these models.…”
Section: Related Workmentioning
confidence: 99%
“…These models were found to perform well on tasks involving code-mixed text too [1,18]. Given that these models have to support over 100 languages with the limited model capacity they have, some works have found that they are outperformed by monolingual versions, even on some low-resourced languages [8,25,27,31]. Some of the aforementioned works [26,33,34] have theorized about the reasons why crosslingual transfer works and have stated that factors like pretraining data size and vocabulary overlap between languages could affect the transfer performance of a language in these models.…”
Section: Related Workmentioning
confidence: 99%
“…BERT models have also become available as part of the multilingual BERT model 21 (Devlin et al, 2019) or trained separately for Finnish 22 23 (Kutuzov et al, 2017;Virtanen et al, 2019).…”
Section: Semanticsmentioning
confidence: 99%
“…They are often based on BERT (Bidirectional Encoder Representations from Transformers) (Devlin et al, 2019) and other contextualised architectures. A number of language-specific initiatives have in recent years released monolingual versions of these models for a number of languages (Fares et al, 2017;Kutuzov and Kuzmenko, 2017;Virtanen et al, 2019;de Vries et al, 2019;Ulčar and Robnik-Šikonja, 2020;Koutsikakis et al, 2020;Nguyen and Nguyen, 2020;Farahani et al, 2020;Malmsten et al, 2020). For our purposes, the most important such previous training effort is that of Virtanen et al (2019) on creating a BERT model for Finnish -FinBERT 5as our training setup for creating NorBERT builds heavily on this; see Section 6 for more details.…”
Section: Related Workmentioning
confidence: 99%
“…Our NorBERT model is trained from scratch for Norwegian, and can be used in exactly the same way as any other BERT-like model. The NorBERT training setup heavily builds on prior work on Fin-BERT conducted at the University of Turku (Virtanen et al, 2019).…”
Section: Norbertmentioning
confidence: 99%