Scalable clone detection using description logic

Schügerl, Philipp

doi:10.1145/1985404.1985413

Cited by 10 publications

(3 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Syntactic approaches [19,20,48,50,23] use a parser to convert source programs into parse trees or abstract syntax trees, which can then be processed using either tree matching or structural metrics to find clones. Lee et al [36] proposed a multi-dimensional token-level indexing structure using an R* tree on Deckard's vectors [31].…”

Section: Related Workmentioning

confidence: 99%

WuKong: a scalable and accurate two-phase approach to Android app clone detection

Wang

Guo

et al. 2015

Proceedings of the 2015 International Symposium on Software Testing and Analysis

175

106

View full text Add to dashboard Cite

Repackaged Android applications (app clones) have been found in many third-party markets, which not only compromise the copyright of original authors, but also pose threats to security and privacy of mobile users. Both fine-grained and coarse-grained approaches have been proposed to detect app clones. However, fine-grained techniques employing complicated clone detection algorithms are difficult to scale to hundreds of thousands of apps, while coarse-grained techniques based on simple features are scalable but less accurate. This paper proposes WuKong, a two-phase detection approach that includes a coarse-grained detection phase to identify suspicious apps by comparing light-weight static semantic features, and a fine-grained phase to compare more detailed features for only those apps found in the first phase. To further improve the detection speed and accuracy, we also introduce an automated clustering-based preprocessing step to filter third-party libraries before conducting app clone detection. Experiments on more than 100,000 Android apps collected from five Android markets demonstrate the effectiveness and scalability of our approach.

show abstract

Section: Related Workmentioning

confidence: 99%

WuKong: a scalable and accurate two-phase approach to Android app clone detection

Wang

Guo

et al. 2015

Proceedings of the 2015 International Symposium on Software Testing and Analysis

175

106

View full text Add to dashboard Cite

show abstract

“…CD-Form [42,43], Shuffling Framework [90,91,92], Abd-El-Hafiz [119], Kam1n0 [66], Lavoie and Merlo [80], CLCMiner [111], DL-Clone [123] g Unknown 1 SHINOBI [76,77] Rank C -Quality Reference Corpus: The recall of the tool/technique was evaluated using a well-constructed reference corpus built by the tool/technique authors themselves. To achieve this rank, the authors must have taken care to build a quality reference corpus.…”

Section: Simple Reference Corpus 20mentioning

confidence: 99%

A Survey on the Evaluation of Clone Detection Performance and Benchmarking

Svajlenko,

Roy

2020

Preprint

View full text Add to dashboard Cite

“…It is inspired by the concepts of functional programming. It is used as a scalable approach for data intensive applications [26,27]. It is based on two higher-order functions: Map and Reduce.…”

Section: Parallelizing the Approachmentioning

confidence: 99%

A parallel and efficient approach to large scale clone detection

Sajnani

Saini

Lopes

2015

J Software Evolu Process

View full text Add to dashboard Cite

We propose a new token-based approach for large -scale code clone detection, which is based on a filtering heuristic that reduces the number of token comparisons when the two code blocks are compared. We also present a MapReduce based parallel algorithm that uses the filtering heuristic and scales to thousands of projects. The filtering heuristic is generic and can also be used in conjunction with other token-based approaches. In that context, we demonstrate how it can increase the retrieval speed and decrease the memory usage of the index-based approaches. In our experiments on 36 open source Java projects, we found that: (i) filtering reduces token comparisons by a factor of 10, and thus increasing the speed of clone detection by a factor of 1.5; (ii) the speed-up and scale-up of the parallel approach using filtering is nearlinear on a cluster of 2-32 nodes for 150-2800 projects; and (iii) filtering decreases the memory usage of index-based approach by half and the search time by a factor of 5.The presented approach is very general and can be used with other similarity function like Jaccard, Cosine, etc. 404HITESH SAJNANI, VAIBHAV SAINI AND CRISTINA LOPES Algorithm 5: Clone detection using efficient index-based Index search. Similar to the naive approach, given a query block b 1 , logically, detectClones() here also consists of the following two steps: (i) Fetch the candidate blocksthe terms in b 1 are first ordered using the globalTermPositionMap.Next, each term in the prefix of the ordered b 1 is searched in the partial index to retrieve the block ids of the candidate blocks. These block ids are added to candidatesList (line 21-30, Algorithm 5). It is important to note that unlike the naive approach, no similarity score is calculated here. This is because partial index does not index all the terms of the blocks, and similarity calculation requires all the terms of the candidate code block. In order to address this issue, we create another index that stores all the terms of a block id. We call this index as forward index because its purpose is exactly opposite to that of an inverted index. A forwardindex, when queried with a code block id, returns all the terms in that code block, whereas an 414 HITESH SAJNANI, VAIBHAV SAINI AND CRISTINA LOPES heuristic to improve index-based approaches. We demonstrated that filtering, indeed, can reduce the index size by half and decreases the search time by a factor of 5.5. Our parallel algorithm using filtering technique efficiently scales to thousands of projects and demonstrated near linear speed-up and scale-up. Moreover, its MapReduce based implementation has inherent advantages like load balancing, data replication, and fault tolerance over any other in-house distributed solutions where these things are to be dealt with explicitly. Support for replicating the study. We have made available the input dataset, tools, generated output, and the detailed steps to replicate the study at URL -http://mondego.ics.uci.edu/projects/ clonedetection. The web page has all the 36 subj...

show abstract

Scalable clone detection using description logic

Cited by 10 publications

References 22 publications

WuKong: a scalable and accurate two-phase approach to Android app clone detection

WuKong: a scalable and accurate two-phase approach to Android app clone detection

A Survey on the Evaluation of Clone Detection Performance and Benchmarking

A parallel and efficient approach to large scale clone detection

Contact Info

Product

Resources

About