2020
DOI: 10.48550/arxiv.2005.09117
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Contextual Embeddings: When Are They Worth It?

Abstract: We study the settings for which deep contextual embeddings (e.g., BERT) give large improvements in performance relative to classic pretrained embeddings (e.g., GloVe), and an even simpler baseline-random word embeddings-focusing on the impact of the training set size and the linguistic properties of the task. Surprisingly, we find that both of these simpler baselines can match contextual embeddings on industry-scale data, and often perform within 5 to 10% accuracy (absolute) on benchmark tasks. Furthermore, we… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(9 citation statements)
references
References 7 publications
0
9
0
Order By: Relevance
“…Therefore, the same token in different sentences will have a different embedding. This attention to context correlates with high reliability in several natural language processing tasks, such as named entity recognition, concept extraction, and sentiment analysis, and is relatively better than non-context embedding (Taillé et al, 2020; Arora et al, 2020). In addition, the form of the token, which is part of the sentence, makes BERT more adaptive to typographical errors and variations of word writing.…”
Section: Methodsmentioning
confidence: 99%
“…Therefore, the same token in different sentences will have a different embedding. This attention to context correlates with high reliability in several natural language processing tasks, such as named entity recognition, concept extraction, and sentiment analysis, and is relatively better than non-context embedding (Taillé et al, 2020; Arora et al, 2020). In addition, the form of the token, which is part of the sentence, makes BERT more adaptive to typographical errors and variations of word writing.…”
Section: Methodsmentioning
confidence: 99%
“…Glove [23]). A recent study [24] empirically shows that classical pretrained embeddings can match contextual embeddings on industry-scale data, and often perform within 5 to 10% accuracy (absolute) on benchmark tasks.…”
Section: J O U R N a L P R E -P R O O Fmentioning
confidence: 99%
“…Compared with contextualized embeddings, static embeddings like Skipgram (Mikolov et al, 2013) and GloVe (Pennington et al, 2014) are lighter and less computationally expensive. Furthermore, they can even perform without significant performance loss for contextindependent tasks like lexical-semantic tasks (e.g., word analogy), or some tasks with plentiful labeled data and simple language (Arora et al, 2020).…”
Section: Introductionmentioning
confidence: 99%