2020
DOI: 10.3390/app10217519
|View full text |Cite
|
Sign up to set email alerts
|

A Source Code Similarity Based on Siamese Neural Network

Abstract: Finding similar code snippets is a fundamental task in the field of software engineering. Several approaches have been proposed for this task by using statistical language model which focuses on syntax and structure of codes rather than deep semantic information underlying codes. In this paper, a Siamese Neural Network is proposed that maps codes into continuous space vectors and try to capture their semantic meaning. Firstly, an unsupervised pre-trained method that models code snippets as a weighted series of… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 15 publications
(3 citation statements)
references
References 35 publications
0
3
0
Order By: Relevance
“…Due to deep learning technology and activation functions, deep metric learning, as a combination of deep learning and metric learning, has provided excellent solutions in many classification tasks and attracted researchers’ attention in academia and industry. In the Humpback Whale Identification competition held on the Kaggle platform, which is the world’s largest data science community [ 49 ], the top five participating teams’ solutions all applied deep metric learning models: Triplet neural network [ 50 ] and siamese neural network [ 51 ]. The most conspicuous characteristic of these networks is the sharing weights, which makes the samples related because the triplet neural network can simultaneously learn both positive and negative distances and the number of training data combinations increases significantly to avoid overfitting.…”
Section: Methodsmentioning
confidence: 99%
“…Due to deep learning technology and activation functions, deep metric learning, as a combination of deep learning and metric learning, has provided excellent solutions in many classification tasks and attracted researchers’ attention in academia and industry. In the Humpback Whale Identification competition held on the Kaggle platform, which is the world’s largest data science community [ 49 ], the top five participating teams’ solutions all applied deep metric learning models: Triplet neural network [ 50 ] and siamese neural network [ 51 ]. The most conspicuous characteristic of these networks is the sharing weights, which makes the samples related because the triplet neural network can simultaneously learn both positive and negative distances and the number of training data combinations increases significantly to avoid overfitting.…”
Section: Methodsmentioning
confidence: 99%
“…They started to be used by Bromley et al [19] in signature verification work. They have also been used for different kinds of problems, such as the evaluation of source code similarity [20], cyber attack detection [21], object tracking [22], chromosome classification [23] and even animal sound classification [24]. More recently and regarding the grocery products, Ciocca et al [25] have applied a Siamese network to capture the relations between iconic and natural images in the Grocery Store Dataset [3].…”
Section: Solving the One-shot Learning Problemmentioning
confidence: 99%
“…In some of the recent studies, researchers have highlighted the potential use of information retrieval with learning-based approaches to detect type-4 clones. Xie et al [53] used the TF-IDF approach with the Siamese neural network, while Fu et al [54] combined a Continuous Bag of Words (CBOW)/Skip Gram (SG) model with ensemble learning and reported improved accuracy measures as compared to other solo approaches. Some researchers have used LSI to further examine the clones, which have already been detected by some clone detection tools.…”
Section: Introductionmentioning
confidence: 99%