2021
DOI: 10.48550/arxiv.2102.08036
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Exploring Transformers in Natural Language Generation: GPT, BERT, and XLNet

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
18
0
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 18 publications
(19 citation statements)
references
References 0 publications
0
18
0
1
Order By: Relevance
“…This allows BERT to take the context of each word into account. Similar in its construction, XLNet [32] improved its masking mechanism with peculiar assumptions during its pre-training stage, and improved over the work done by BERT. Despite these advances, studies [8] have shown that even these approaches have still struggled with negation.…”
Section: Word Negation and Sequence Labelingmentioning
confidence: 99%
See 1 more Smart Citation
“…This allows BERT to take the context of each word into account. Similar in its construction, XLNet [32] improved its masking mechanism with peculiar assumptions during its pre-training stage, and improved over the work done by BERT. Despite these advances, studies [8] have shown that even these approaches have still struggled with negation.…”
Section: Word Negation and Sequence Labelingmentioning
confidence: 99%
“…The growing complexity of text data has made NLP applications increasingly vital in analyzing large volumes of data [10,14]. NLP solutions are one of the backbones of some intelligent models like BERT, XLNet, and GPT, which have advanced the cause of sentiment analysis [21,32], and machine language translation [27]. NLP-enabled models propel the understanding of language structure and interpretation.…”
Section: Natural Language Processing (Nlp)mentioning
confidence: 99%
“…The pre-trained language models (PrLM) [13] have reached remarkable achievements in learning universal natural language representations by pre-training large language models on massive general corpus and fine-tuning them on downstream tasks. BERT [14], which is derived from the Transformer's encoder, is the most representative among PrLMs [15], the multi-head self-attention in the Transformer is a vital mechanism, it is essentially a variant of the graph attention network [16] (GAT).…”
Section: Related Work a Pre-trained Language Modelsmentioning
confidence: 99%
“…Transformer (Vaswani et al, 2017), an alternative to convolutional neural networks, has dominated the field of natural language processing (NLP), including speech recognition (Dong et al, 2018), synthesis (Li et al, 2019b), text to speech translation (Vila et al, 2018), and natural language generation (Topal et al, 2021). As a example of deep learning architectures, Transformer was first introduced to handle sequential inference tasks in NLP.…”
Section: Introductionmentioning
confidence: 99%