2019
DOI: 10.48550/arxiv.1906.01698
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Open Sesame: Getting Inside BERT's Linguistic Knowledge

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

2
29
1

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 27 publications
(32 citation statements)
references
References 19 publications
2
29
1
Order By: Relevance
“…experimentally investigated the power of self-attention to extract word order information, finding differences between recurrent and self-attention models; however, these were modulated by the training objective. Lin et al (2019) and Tenney et al (2019) show that BERT (Devlin et al, 2018) encodes syntactic informa-tion.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…experimentally investigated the power of self-attention to extract word order information, finding differences between recurrent and self-attention models; however, these were modulated by the training objective. Lin et al (2019) and Tenney et al (2019) show that BERT (Devlin et al, 2018) encodes syntactic informa-tion.…”
Section: Related Workmentioning
confidence: 99%
“…Consequently, many researchers have studied the capability of recurrent neural network models to capture context-free languages (e.g., Kalinke and Lehmann (1998); Gers and Schmidhuber (2001); Grüning (2006); Weiss et al (2018); Sennhauser and Berwick (2018)) and linguistic phenomena involving hierarchical structure (e.g., Linzen et al (2016); Gulordava et al (2018)). Some experimental evidence suggests that transformers might not be as strong as LSTMs at modeling hierarchical structure (Tran et al, 2018), though analysis studies have shown that transformer-based models encode a good amount of syntactic knowledge (e.g., Clark et al (2019); Lin et al (2019); Tenney et al (2019)).…”
mentioning
confidence: 99%
“…the contextualized representations that these LM compute, revealing that they encode substantial amounts of syntax and semantics (Linzen et al, 2016b;Peters et al, 2018b;Tenney et al, 2019b;Goldberg, 2019;Hewitt and Manning, 2019;Tenney et al, 2019a;Lin et al, 2019;Coenen et al, 2019).…”
Section: Introductionmentioning
confidence: 99%
“…To discuss the contribution of the short-term properties to the representative capability for NLP tasks, we also measure the performances in the short-term range with the following three tasks: MLM task, semantic textual similarity benchmark (STS-B) [17], and handwriting task (see the Appendix for the detailed setups). These layerwise analyses are similar to those in [18] which evaluates BERT performance, and our study inspects the properties for wider time range. In parallel, we investigate the system's global properties in the long term analysis.…”
Section: A Albert As "The Reservoir"mentioning
confidence: 77%

Transient Chaos in BERT

Inoue,
Ohara,
Kuniyoshi
et al. 2021
Preprint