Deep Learning Meets Software Engineering: A Survey on Pre-Trained Models of Source Code

Sang, Yisi; Mou, Xiangyang; Li, Jing; Stanton, Jeffrey M.; Yu, Mei

doi:10.24963/ijcai.2022/775

Cited by 14 publications

(8 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recent research has focused on using pretrained neural language models (LMs) in natural language processing (NLP) to automate code generation tasks using large-scale code corpus data from open-source repositories [23,43,25]. Notable examples of these pretrained models include CodeBERT [11] with encoder-only, CodeGPT [23] with decoder-only as well as PLABRT [1] and CodeT5 [40] with encoder-decoder transformer architectures.…”

Section: Pretrained Models For Code Generationmentioning

confidence: 99%

Execution-based Code Generation using Deep Reinforcement Learning

Shojaee¹,

Aneesh²,

Tipirneni³

et al. 2023

Preprint

View full text Add to dashboard Cite

The utilization of programming language (PL) models, pretrained on large-scale code corpora, as a means of automating software engineering processes has demonstrated considerable potential in streamlining various code generation tasks such as code completion, code translation, and program synthesis. However, current approaches mainly rely on supervised fine-tuning objectives borrowed from text generation, neglecting specific sequencelevel features of code, including but not limited to compilability as well as syntactic and functional correctness. To address this limitation, we propose PPOCoder, a new framework for code generation that combines pretrained PL models with Proximal Policy Optimization (PPO) deep reinforcement learning and employs execution feedback as the external source of knowledge into the model optimization. PPOCoder is transferable across different code generation tasks and PLs. Extensive experiments on three code generation tasks demonstrate the effectiveness of our proposed approach compared to SOTA methods, improving the success rate of compilation and functional correctness over different PLs. Our code can be found at https: //github.com/reddy-lab-code-research/PPOCoder.

show abstract

Section: Pretrained Models For Code Generationmentioning

confidence: 99%

Execution-based Code Generation using Deep Reinforcement Learning

Shojaee¹,

Aneesh²,

Tipirneni³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…These are also the SE tasks that are typically used to evaluate pre-trained models of source code. Following previous work [27], in the first two columns, we classify each task along two dimensions: (1) whether the task concerns understanding (Und.) or generation (Gen.); and…”

Section: A Se Tasksmentioning

confidence: 99%

“…Within each group, we order them chronologically (by the date of the preprint or the official publication). To enable the reader to better understand their similarities and differences, we categorize the PTMs of source code (i.e., PTM-Cs and CodePTMs) along the four dimensions proposed by Niu et al [27] 1 :…”

Section: Pre-trained Modelsmentioning

confidence: 99%

“…Despite the fact that a large number of CodePTMs have been successfully developed and applied to a variety of SE tasks in recent years, our understanding of CodePTMs is arguably fairly limited. Currently, only one survey of pretrained models of source code is available from Niu et al [27], but it just performs a summary and analysis from the results reported by the origin model. While pre-trained models are task-agnostic and therefore can be applied to different SE tasks by design, virtually all CodePTMs have been evaluated on only a handful of SE tasks.…”

Section: Introductionmentioning

confidence: 99%

“…With the goal of advancing our understanding of existing pre-trained models of source code, we conduct the first systematic empirical comparison of 19 recently-developed CodePTMs on 13 popular SE tasks. To gain additional insights into these CodePTMs, we employ a recently-developed fourdimensional categorization of CodePTMs [27] to categorize existing the 19 CodePTMs used in our study, and subsequently investigate whether there are correlations between categories of CodePTMs and their performances on SE tasks.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

An Empirical Comparison of Pre-Trained Models of Source Code

Niu¹,

Li²,

Ng³

et al. 2023

Preprint

View full text Add to dashboard Cite

While a large number of pre-trained models of source code have been successfully developed and applied to a variety of software engineering (SE) tasks in recent years, our understanding of these pre-trained models is arguably fairly limited. With the goal of advancing our understanding of these models, we perform the first systematic empirical comparison of 19 recently-developed pre-trained models of source code on 13 SE tasks. To gain additional insights into these models, we adopt a recently-developed 4-dimensional categorization of pretrained models, and subsequently investigate whether there are correlations between different categories of pre-trained models and their performances on different SE tasks.

show abstract

CUTE: A Collaborative Fusion Representation-Based Fine-Tuning and Retrieval Framework for Code Search

Song,

Liu,

2024

Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

View full text Add to dashboard Cite

Deep Learning Meets Software Engineering: A Survey on Pre-Trained Models of Source Code

Cited by 14 publications

References 10 publications

Execution-based Code Generation using Deep Reinforcement Learning

Execution-based Code Generation using Deep Reinforcement Learning

An Empirical Comparison of Pre-Trained Models of Source Code

CUTE: A Collaborative Fusion Representation-Based Fine-Tuning and Retrieval Framework for Code Search

Contact Info

Product

Resources

About