2020
DOI: 10.48550/arxiv.2002.12327
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Primer in BERTology: What we know about how BERT works

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
64
0
1

Year Published

2020
2020
2021
2021

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 51 publications
(65 citation statements)
references
References 0 publications
0
64
0
1
Order By: Relevance
“…The distribution of linguistic information differs not only by model and input data (Ethayarajh 2019;Gao et al 2019), but even for different hyperparameters, training runs, or contexts (Tenney, Das, and Pavlick 2019). Indeed, it may not be uniquely identifiable at all (Rogers, Kovaleva, and Rumshisky 2020). For that reason, an aggregation of layer representations that provides semantic similarity or distances between tokens currently requires exogenous labels for a probing procedure (Hewitt and C. D. Manning 2019).…”
Section: A1 Challenges In the Application Of Deep Language Modelsmentioning
confidence: 99%
See 3 more Smart Citations
“…The distribution of linguistic information differs not only by model and input data (Ethayarajh 2019;Gao et al 2019), but even for different hyperparameters, training runs, or contexts (Tenney, Das, and Pavlick 2019). Indeed, it may not be uniquely identifiable at all (Rogers, Kovaleva, and Rumshisky 2020). For that reason, an aggregation of layer representations that provides semantic similarity or distances between tokens currently requires exogenous labels for a probing procedure (Hewitt and C. D. Manning 2019).…”
Section: A1 Challenges In the Application Of Deep Language Modelsmentioning
confidence: 99%
“…For that reason, an aggregation of layer representations that provides semantic similarity or distances between tokens currently requires exogenous labels for a probing procedure (Hewitt and C. D. Manning 2019). Even if available, this procedure introduces a gap between the training objective of the model and the analysis procedure used for inference, possibly leading to bias (Rogers, Kovaleva, and Rumshisky 2020).…”
Section: A1 Challenges In the Application Of Deep Language Modelsmentioning
confidence: 99%
See 2 more Smart Citations
“…There is a growing field of study concerned with investigating the inner working of large-scale transformers like BERT. A new word BERTology has even been defined to describe the related research work carried out around BERT [3]. Our work is focused on BERT input embedding that is both critical to the BERT model and easy to overlook.…”
Section: Introductionmentioning
confidence: 99%