Viewing functions as token sequences to highlight similarities in source code

Chilowicz, Michel; Duris, íTienne; Roussel, Gilles

doi:10.1016/j.scico.2012.11.008

Cited by 7 publications

(4 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Analysis by applying the duplicate code model of this study, we first construct the similarity matrix as shown in figure 1. After constructing the matrix, convert the similarity distance into a transaction set, and extract the set with the file number 1 as shown in the following code: (2,86), (3,27), (4,40), (5,45), (6,35), (7,28), (8,40), (9,45), (10,122), (11,62), (12,53), (13,56), (14,141), (15,149), (16,56), (17,54), (18,84), (19,69), (20,83) From the code above we can see that the obtained item set still contains too much information. The corresponding file data set details are shown in Figure 2.…”

Section: Experimental Testmentioning

confidence: 99%

“…T1: [1,24,34,46,59,65,81,17,62,97] T2: [2,25,43,47,49,57,60,66,71,82,85,18,38,69] T3: [3,35,58,61,68,80,83,19,50,75] T4: [4,14,30,8] T5: [5,29,15,44,79,89,91,9,52] T6: [6,16,92,11,53] T7: [7,20,31,54] T8: [8,51,14,30,4] T9:…”

Section: Experimental Testmentioning

confidence: 99%

“…Li Siyu proposed an intermediate representation code similarity detection method [14] . Michel Chilowicz et al combined function call graph with word sequence matching to detect code similarity [15,16] . Tokenization analysis also has literature [17].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Similarity Code File Detection Model Based on Frequent Itemsets

Jiang¹,

Wang²

2018

dtcse

View full text Add to dashboard Cite

In order to improve the efficiency and accuracy of program source code similarity detection, an improvement on the method of code detection is made according to some deficiencies of the current research. A similar code detection model based on frequent item sets is proposed. The model constructs frequent items set data to discover repetitive code collections and automatically divide file similarity attribution. The algorithm model does not need to consider the type of the code in the detection process, and has wide applicability, not only can detect the code files of different programming languages and grammars, but also can mark out similar codes and statistic the results. Simultaneously, through experimental comparison, it is proved that the model has high accuracy and processing efficiency.

show abstract

Section: Experimental Testmentioning

confidence: 99%

Section: Experimental Testmentioning

confidence: 99%

See 1 more Smart Citation

Similarity Code File Detection Model Based on Frequent Itemsets

Jiang¹,

Wang²

2018

dtcse

View full text Add to dashboard Cite

show abstract

“…Zhuo Li et al [9] combined the dynamic text matching algorithm with suffix tree algorithm for similitude code within source files, achieved a similar code detection tool, actually united the method of abstract syntax tree. Michel Chilowicz et al [10] through the factorization of the function call graphs, detected the similarity of source code from the function level. Sharma A et al [11,12] determined the similarity of two functions according to the similarity of the internal operating instructions, and eventually get the similarity of the two applications.…”

Section: Introductionmentioning

confidence: 99%

A Code Classification Method Based on TF-IDF

Wang¹,

Jiang²,

Ma³

2018

dtem

View full text Add to dashboard Cite

The main purpose of the study is to find the code with similar possibilities to effectively avoid the adverse effects of code duplication. Through the clustering pretreatment of document feature information, to extract the relevant features of the document. Then the basic characteristics are used to cluster the document, to find out the best number of clusters. According to the reasonable number of clusters that have been found, using the vectors that generated through TF-IDF method, combined the K-means clustering algorithm to distinguish the contents of the files, as well as the introduction of cosine similarity, to determine the similarity of two texts and classify the parallel documents. From the test data set, the method can accurately find the code with the possibility of duplication and works quiet well.

show abstract

A Fuzzy R Code Similarity Detection Algorithm

Bartoszuk

Gągolewski

2014

Information Processing and Management of Uncertainty in Knowledge-Based Systems

View full text Add to dashboard Cite

Viewing functions as token sequences to highlight similarities in source code

Cited by 7 publications

References 24 publications

Similarity Code File Detection Model Based on Frequent Itemsets

Similarity Code File Detection Model Based on Frequent Itemsets

A Code Classification Method Based on TF-IDF

A Fuzzy R Code Similarity Detection Algorithm

Contact Info

Product

Resources

About