Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering 2016
DOI: 10.1145/2970276.2970326
|View full text |Cite
|
Sign up to set email alerts
|

Deep learning code fragments for code clone detection

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
331
0
2

Year Published

2017
2017
2024
2024

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 502 publications
(334 citation statements)
references
References 60 publications
1
331
0
2
Order By: Relevance
“…The key challenge is to accurately represent the structure of code changes, which are not contiguous text like the commit message, but rather amount to scattered fragments of removed and added code across multiple files, within multiple hunks. Thus, different from existing deep learning techniques working on source code [24], [36], [66], [68], PatchNet constructs separate embedding vectors representing the removed code and the added code in each hunk of each affected file in the given patch. The information about a file's hunks are then concatenated to build an embedding vector for the affected file.…”
Section: Introductionmentioning
confidence: 99%
“…The key challenge is to accurately represent the structure of code changes, which are not contiguous text like the commit message, but rather amount to scattered fragments of removed and added code across multiple files, within multiple hunks. Thus, different from existing deep learning techniques working on source code [24], [36], [66], [68], PatchNet constructs separate embedding vectors representing the removed code and the added code in each hunk of each affected file in the given patch. The information about a file's hunks are then concatenated to build an embedding vector for the affected file.…”
Section: Introductionmentioning
confidence: 99%
“…As our approach based on code For clone detection, many techniques in the literature generally begin by generating some intermediate representations for code before measuring similarity. According to source code representation, these techniques can be classified as text-based (e.g., [38]- [40]), token-based (e.g., [41]- [43]), tree-based (e.g., [24], [44], [45]), graph-based (e.g., [46]- [49]), semantic-based (e.g., [50]- [53]), deep-learning-based (e.g., [35], [54]), or a mixture. Our approach complements those studies by applying word embedding to smart contract code and its syntax structures to search for smart contracts of various levels of granularity.…”
Section: Clone Detection Bug Detection and Code Validationmentioning
confidence: 99%
“…In this paper, we build FA-AST for Java programs and evaluate FA-AST and graph neural networks on two code clone datasets: Google Code Jam dataset collected by [6] and the widely used clone detection benchmark BigCloneBench [9]. The results show that our approach outperforms most existing clone detection approaches, especially several ASTbased deep learning approaches including RtvNN [2], CDLH [3] and ASTNN [4].…”
Section: Introductionmentioning
confidence: 96%
“…Most of these approaches include two steps: use neural networks to calculate a vector representation for each code fragment, then calculate the similarity between two code vector representations to detect clones. To leverage the explicit structural information in programs, these approaches often use abstract syntax tree (AST) as the input of their models [2]- [4]. A typical example of these approaches is CDLH [3], which encode code fragments by directly applying Tree-LSTM [5] on binarized ASTs.…”
Section: Introductionmentioning
confidence: 99%