2022
DOI: 10.48550/arxiv.2201.01549
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

SPT-Code: Sequence-to-Sequence Pre-Training for Learning Source Code Representations

Abstract: Recent years have seen the successful application of large pretrained models to code representation learning, resulting in substantial improvements on many code-related downstream tasks. But there are issues surrounding their application to SE tasks. First, the majority of the pre-trained models focus on pre-training only the encoder of the Transformer. For generation tasks that are addressed using models with the encoder-decoder architecture, however, there is no reason why the decoder should be left out duri… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 40 publications
0
5
0
Order By: Relevance
“…Baselines. While comparing the evaluation results for different tasks, we compare with large scale pre-trained models, including GPT-2 [50], CodeGPT [43], PLBART [5], SPT-Code [45] and CodeT5 [63]. Most of our fine-tuning evaluation is on benchmarked dataset; thus, we report the available results from CodeXGLUE leaderboard [3].…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Baselines. While comparing the evaluation results for different tasks, we compare with large scale pre-trained models, including GPT-2 [50], CodeGPT [43], PLBART [5], SPT-Code [45] and CodeT5 [63]. Most of our fine-tuning evaluation is on benchmarked dataset; thus, we report the available results from CodeXGLUE leaderboard [3].…”
Section: Methodsmentioning
confidence: 99%
“…For code generation tasks, GPT-3 or BARTstyle models (e.g., Codex, CodeT5, PLBART, SPTCode, etc. [5,19,45,63]) are popular. The important insight here is that independent of final tasks, when very high capacity models are trained with huge code corpora to learn simple, self-supervised, "busy work", they still learn general syntactic and semantic constraints of writing code.…”
Section: Introductionmentioning
confidence: 99%
“…Researchers have been passionate about pre-training Transformer models for source code. There are three main architectures for existing models: Encoderonly [6,7,20,30,37,45,75], Decoder-only [4,26,77], and Encoderdecoder [1,11,29,36,62]. Encoder-only models are commonly pretrained with cloze tasks (e.g., masked language model) and sequence understanding tasks (e.g., next statement prediction).…”
Section: Related Workmentioning
confidence: 99%
“…Encoder-only models are commonly pretrained with cloze tasks (e.g., masked language model) and sequence understanding tasks (e.g., next statement prediction). Decoder-only models are mostly trained with autoregressive, left-to-right language model (LM) Encoder-Decoder models are pre-trained with different tasks including denoising autoencoding to reconstruct the wrongly permuted tokens [1], predicting missing identifiers [76], recovering method names [62], etc. In recent years, with the rapid development of computing devices, such as GPUs and TPUs, researcher also shed light on the incredible power of extremely large Transformer models (up to hundreds of billions of parameters) for understanding and generating code [4,26,29,32].…”
Section: Related Workmentioning
confidence: 99%
“…We also set the beam size as 250 while CURE's beam is configured to 1000. According to previous work [58,76], a larger training set and beam size may…”
Section: Rq2: What Is the Performance Of A Single Circle Model Compar...mentioning
confidence: 99%