2019
DOI: 10.48550/arxiv.1906.01502
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

How multilingual is Multilingual BERT?

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
82
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 120 publications
(103 citation statements)
references
References 11 publications
1
82
0
Order By: Relevance
“…if ! = then = ( )[ ] 30: end for 31: return False, , _ over the years and have decided to finalize Bi-LSTM-CNN [3], Bi-GRU-CNN [3], Transformer [12], char-CNN [21] and mBERT [15] based architectures for demonstration of the model agnostic nature of our adversarial attack technique. The maximum input sequence length, vocabulary size, learning rate for these experiments were set at 25, 17k, and 0.001 respectively.…”
Section: Experiments and Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…if ! = then = ( )[ ] 30: end for 31: return False, , _ over the years and have decided to finalize Bi-LSTM-CNN [3], Bi-GRU-CNN [3], Transformer [12], char-CNN [21] and mBERT [15] based architectures for demonstration of the model agnostic nature of our adversarial attack technique. The maximum input sequence length, vocabulary size, learning rate for these experiments were set at 25, 17k, and 0.001 respectively.…”
Section: Experiments and Resultsmentioning
confidence: 99%
“…We introduce a three-step attack strategy that can be used for generating adversarial examples using minimal resources for any type of code-mixed data (with and without transliteration). We have used our framework to evaluate the success of adversarial attacks on a few sentiment classification models [14,15,21] that have been diagnosed effective on code-mixed data. Research on adversarial techniques has become an important aspect, especially for securitycritical applications, as it helps us in both analyzing the fallacies of the models, and make them more robust.…”
Section: Introductionmentioning
confidence: 99%
“…Oversampling of low resource languages is done to overcome data imbalance. It has shown great results on zero-shot transfer learning for various downstream tasks and also helped in code-switched data tasks [21].…”
Section: Multilingual-bert (M-bert)mentioning
confidence: 99%
“…Another set of studies have identified ways to make these models more efficient by methods such as pruning (McCarley, 2019;Gordon et al, 2020;Sajjad et al, 2020;Budhraja et al, 2020). A third set of studies show that multilingual extensions of these models, such as Multilingual BERT (Devlin et al, 2019), have surprisingly high crosslingual transfer (Pires et al, 2019;Wu and Dredze, 2019).…”
Section: Introductionmentioning
confidence: 99%