2019
DOI: 10.48550/arxiv.1910.13461
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

7
963
0
2

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 651 publications
(972 citation statements)
references
References 0 publications
7
963
0
2
Order By: Relevance
“…Unlike BERT that is only applicable to language understanding via one encoder, MASS [37] pre-trains an encoder-decoder model for language generation via masked sequence to sequence learning proxy tasks. Mostly recently, BART [17] generalizes BERT for both language understanding and generation by combining bidirectional and auto-regressive transformers for pre-training. Taking the inspiration from MASS and BART, our work pursuits their vision-language counterpart by pre-training a universal encoder-decoder structure and fine-tuning it to both vision-language perception and generation tasks.…”
Section: Related Workmentioning
confidence: 99%
“…Unlike BERT that is only applicable to language understanding via one encoder, MASS [37] pre-trains an encoder-decoder model for language generation via masked sequence to sequence learning proxy tasks. Mostly recently, BART [17] generalizes BERT for both language understanding and generation by combining bidirectional and auto-regressive transformers for pre-training. Taking the inspiration from MASS and BART, our work pursuits their vision-language counterpart by pre-training a universal encoder-decoder structure and fine-tuning it to both vision-language perception and generation tasks.…”
Section: Related Workmentioning
confidence: 99%
“…By learning on this multi-class classification problem (the auxiliary task), the model can learn general features from these images that can be used for classification (the main task) later. There are self-supervised techniques across computer vision [31], [32], natural language processing [33], and speech recognition tasks [34]. In anomaly detection, [35], [36] use self-supervised visual representation learning to learn the features of in-distribution (normal) samples.…”
Section: Related Workmentioning
confidence: 99%
“…Formally, given a training dataset D = {(v i , q i , a i )} s i=1 , where v i denotes the i th training image, s is the total number of training images, and q i and a i represent the question and its corresponding answer, respectively. We use a sequence-to-sequence model that is composed of an encoder and decoder, such as T5 (Raffel et al, 2020) or BART (Lewis et al, 2019). Let θ be the parameters of the model p that needs to be trained.…”
Section: Overviewmentioning
confidence: 99%