2013
DOI: 10.1016/j.scico.2012.11.008
|View full text |Cite
|
Sign up to set email alerts
|

Viewing functions as token sequences to highlight similarities in source code

Abstract: The detection of similarities in source code has applications not only in software re-engineering (to eliminate redundancies) but also in software plagiarism detection. This latter can be a challenging problem since more or less extensive edits may have been performed on the original copy: insertion or removal of useless chunks of code, rewriting of expressions, transposition of code, inlining and outlining of functions, etc. In this paper, we propose a new similarity detection technique not only based on toke… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
4
0

Year Published

2014
2014
2024
2024

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(4 citation statements)
references
References 24 publications
0
4
0
Order By: Relevance
“…Analysis by applying the duplicate code model of this study, we first construct the similarity matrix as shown in figure 1. After constructing the matrix, convert the similarity distance into a transaction set, and extract the set with the file number 1 as shown in the following code: (2,86), (3,27), (4,40), (5,45), (6,35), (7,28), (8,40), (9,45), (10,122), (11,62), (12,53), (13,56), (14,141), (15,149), (16,56), (17,54), (18,84), (19,69), (20,83) From the code above we can see that the obtained item set still contains too much information. The corresponding file data set details are shown in Figure 2.…”
Section: Experimental Testmentioning
confidence: 99%
See 2 more Smart Citations
“…Analysis by applying the duplicate code model of this study, we first construct the similarity matrix as shown in figure 1. After constructing the matrix, convert the similarity distance into a transaction set, and extract the set with the file number 1 as shown in the following code: (2,86), (3,27), (4,40), (5,45), (6,35), (7,28), (8,40), (9,45), (10,122), (11,62), (12,53), (13,56), (14,141), (15,149), (16,56), (17,54), (18,84), (19,69), (20,83) From the code above we can see that the obtained item set still contains too much information. The corresponding file data set details are shown in Figure 2.…”
Section: Experimental Testmentioning
confidence: 99%
“…T1: [1,24,34,46,59,65,81,17,62,97] T2: [2,25,43,47,49,57,60,66,71,82,85,18,38,69] T3: [3,35,58,61,68,80,83,19,50,75] T4: [4,14,30,8] T5: [5,29,15,44,79,89,91,9,52] T6: [6,16,92,11,53] T7: [7,20,31,54] T8: [8,51,14,30,4] T9:…”
Section: Experimental Testmentioning
confidence: 99%
See 1 more Smart Citation
“…Zhuo Li et al [9] combined the dynamic text matching algorithm with suffix tree algorithm for similitude code within source files, achieved a similar code detection tool, actually united the method of abstract syntax tree. Michel Chilowicz et al [10] through the factorization of the function call graphs, detected the similarity of source code from the function level. Sharma A et al [11,12] determined the similarity of two functions according to the similarity of the internal operating instructions, and eventually get the similarity of the two applications.…”
Section: Introductionmentioning
confidence: 99%