Learning to represent programs with heterogeneous graphs

Zhang, Kechi; Wang, Wenhan; Zhang, Huangzhao; Li, Ge; Jin, Zhi

doi:10.1145/3524610.3527905

Cited by 42 publications

(23 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A crucial aspect of deep learning for code assistance is the choice of representation and model architecture. Recent approaches have explored grammar-based representations [1], graph-based representations [2], and hybrid models that integrate symbolic reasoning with deep learning [3]. These methods have shown promise in tasks such as code completion, repair, refactoring, and optimization.…”

Section: Deep Learning Methods For Code 21 Representations and Model ...mentioning

confidence: 99%

“…Grammar-based representations utilize the syntactic structure of programming languages to build deep learning models capable of understanding and generating code. One approach involves the use of Abstract Syntax Trees (ASTs) [1], which provide a hierarchical representation of code, capturing its structure and semantics. AST-based models have demonstrated success in tasks such as code completion and bug detection.…”

Section: Grammar-based Representationsmentioning

confidence: 99%

See 1 more Smart Citation

Harnessing Deep Learning for Efficient and Responsible AI Code Assistants: A Comprehensive Study of Methods, Evaluation, and Human Interaction

Srivastava

2023

Preprint

View full text Add to dashboard Cite

The increasing complexity of software development has led to a growing need for intelligent code assistance tools that can aid developers in various tasks, such as code generation, translation, and repair. Deep learning has emerged as a promising approach to addressing this need. This paper presents a comprehensive study of deep learning methods, evaluation metrics, and human interaction techniques for AI code assistants, aiming to provide a solid foundation for researchers and practitioners in the field. We discuss the importance of responsible AI in the context of code assistance, focusing on fairness, security, robustness, and privacy. Furthermore, we highlight the significance of open science practices in promoting reproducibility and transparency in the development of deep learning models for code. Finally, we outline potential future research directions in the area of deep learning for code assistance.

show abstract

Section: Deep Learning Methods For Code 21 Representations and Model ...mentioning

confidence: 99%

Section: Grammar-based Representationsmentioning

confidence: 99%

Harnessing Deep Learning for Efficient and Responsible AI Code Assistants: A Comprehensive Study of Methods, Evaluation, and Human Interaction

Srivastava

2023

Preprint

View full text Add to dashboard Cite

show abstract

“…We also think that the source code before and after source code modeling is also symmetric. There exists a lot of excellent literature on source code analysis [46][47][48][49][50][51][52][53][54][55][56][57][58], and we have shown the representative source code models from the past five years in Table 1. From Table 1, we know that scholars have considered structural information (e.g., AST, API) in recent years, instead of just treating the source code as pure text.…”

Section: Source Code Modelingmentioning

confidence: 99%

“…According to the different representations of source code, we divide the source code modeling into four types: Token-based, Tree-based, Graph-based, and other source code modeling. [40] IJCAI AST CCD LSTM [6] ICPC AST SCS Seq2seq [17] IJCAJ × (API, comments) (API, code, comments) SCS LSTM [19] ASE AST, sourcecode SCS Bi-LSTM [43] ICLR AST, (token, path, token) PF, SCS HOPE [50] MSR identifier, AST, CFG, Bytecode CCD RNN [51] ICLR variable/statetrace PR Word2vec [53] ESEC abstractedsymbolictraces ECM GGNN [55] ICLR AST, PDG PF MLP [62] ASE × tokens SCS RNN [63] AAAI AST SCS, SCC GRU [22] ICSE text, ASTnodetokens SCS Bi-LSTM [46] ICSE AST, ST-trees SCC, CCD Bi-LSTM [64] POPL AST, (token, path, token) PF Bi-LSTM [65] ASE text SCS Bi-LSTM [23] ICSE × AST, codesequence SCS BERT [24] Access functionalkeywords SCS Transformer [25] arXiv × comments, code SCS, CR Transformer [26] ACL × AST SCS GRU [44] ESE tokens, AST SCS Regularizer [56] IST AST SCG GRU [66] JCRD (code, API, comments), (function, comments) SCS GRU [67] ACL text, ASTnodetokens SCS GNN [68] arXiv AST, context SCS Seq2Seq [69] ICPC × (seq, comment), (context, comment) SCS API2Com [70] ICPC AST, API, seq SCS…”

Section: Source Code Modelingmentioning

confidence: 99%

A Survey of Automatic Source Code Summarization

Zhang¹,

Wang²,

Zhou

et al. 2022

Symmetry

View full text Add to dashboard Cite

Source code summarization refers to the natural language description of the source code’s function. It can help developers easily understand the semantics of the source code. We can think of the source code and the corresponding summarization as being symmetric. However, the existing source code summarization is mismatched with the source code, missing, or out of date. Manual source code summarization is inefficient and requires a lot of human efforts. To overcome such situations, many studies have been conducted on Automatic Source Code Summarization (ASCS). Given a set of source code, the ASCS techniques can automatically generate a summary described with natural language. In this paper, we give a review of the development of ASCS technology. Almost all ASCS technology involves the following stages: source code modeling, code summarization generation, and quality evaluation. We further categorize the existing ASCS techniques based on the above stages and analyze their advantages and shortcomings. We also draw a clear map on the development of the existing algorithms.

show abstract

“…Nevertheless, employing the resulting retrained models (henceforth PTM-Cs) for SE tasks is not ideal, as there are code-specific characteristics that may not be properly taken into account by these models, such as the syntactic [17], [18] and semantic structures [19] inherent in source code [20]. Consequently, SE researchers have developed a number of pre-trained models of source code (henceforth CodePTMs) that take into account code-specific characteristics in the past few years [21]- [26].…”

Section: Introductionmentioning

confidence: 99%

An Empirical Comparison of Pre-Trained Models of Source Code

Niu¹,

Li²,

Ng³

et al. 2023

Preprint

View full text Add to dashboard Cite

While a large number of pre-trained models of source code have been successfully developed and applied to a variety of software engineering (SE) tasks in recent years, our understanding of these pre-trained models is arguably fairly limited. With the goal of advancing our understanding of these models, we perform the first systematic empirical comparison of 19 recently-developed pre-trained models of source code on 13 SE tasks. To gain additional insights into these models, we adopt a recently-developed 4-dimensional categorization of pretrained models, and subsequently investigate whether there are correlations between different categories of pre-trained models and their performances on different SE tasks.

show abstract

Learning to represent programs with heterogeneous graphs

Cited by 42 publications

References 31 publications

Harnessing Deep Learning for Efficient and Responsible AI Code Assistants: A Comprehensive Study of Methods, Evaluation, and Human Interaction

Harnessing Deep Learning for Efficient and Responsible AI Code Assistants: A Comprehensive Study of Methods, Evaluation, and Human Interaction

A Survey of Automatic Source Code Summarization

An Empirical Comparison of Pre-Trained Models of Source Code

Contact Info

Product

Resources

About