SimCad: An extensible and faster clone detection tool for large scale software systems

Uddin, Sarder Nasir; Roy, Chanchal K.; Schneider, Kevin A.

doi:10.1109/icpc.2013.6613857

Cited by 26 publications

(16 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Hanni uses the clone detection library SimLib which is part of SimCad [21], an implementation of the textual clone detection approach Simhash [22]. We use SimLib because it allows the detection of clone types 1-3 and its Java implementation is portable and easily extensible.…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Detection of code clones in software generators

Lillack

Bucholdt

Schilling

2014

Proceedings of the 6th International Workshop on Feature-Oriented Software Development

View full text Add to dashboard Cite

Macro-based generators are in use for more than 40 years to generate Cobol source code and implement variability. Over the course of time, the systems were extended with many similar functionalities by copying and adapting existing pieces of code. The resulting generators are hard to understand and difficult to maintain. Clone detection can identify similar pieces of code which is a prerequisite to extract common features thus enabling a move to a featureoriented product line. This paper presents Hanni, a tool that combines clone detection of the input and output of generators to improve detection quality. Hanni uses standard textual clone detection tools on macro-based generators to detect clones in the macros and the generated Cobol. A mapping of the clones from the two sources is used to verify the detected clones and even suggest possible semantic clones. We are using generator examples from different industries based on the ADS generator framework to evaluate our tool. The results show that code clones are very common in these generators and possible problems in the detection can be identified using the generated files.

show abstract

Section: Methodsmentioning

confidence: 99%

“…Most clone detection tools use parser generators to support different programming languages. For example, Deckard uses YACC [10], NiCad [5] and SimCad [21] use TXL [4], CCFinder uses custom Python-based lexers [12].…”

Section: Methodsmentioning

confidence: 99%

Detection of code clones in software generators

Lillack

Bucholdt

Schilling

2014

Proceedings of the 6th International Workshop on Feature-Oriented Software Development

View full text Add to dashboard Cite

show abstract

“…Once the function pairs are established, we ran a set of publicly available clone detection tools, including Simcad [12], Nicad [9], MeCC [3] and CCCD [5]. We found that these tools are inconsistent for many of the clones detected.…”

Section: Determining Clonesmentioning

confidence: 99%

A code clone oracle

Krutz

2014

Proceedings of the 11th Working Conference on Mining Software Repositories

View full text Add to dashboard Cite

Code clones are functionally equivalent code segments. Detecting code clones is important for determining bugs, fixes and software reuse. Code clone detection is also essential for developing fast and precise code search algorithms. However, the challenge of such research is to evaluate that the clones detected are indeed functionally equivalent, considering the majority of clones are not textual or even syntactically identical. The goal of this work is to generate a set of method level code clones with a high confidence to help to evaluate future code clone detection and code search tools to evaluate their techniques. We selected three open source programs, Apache, Python and PostgreSQL, and randomly sampled a total of 1536 function pairs. To confirm whether or not these function pairs indicate a clone and what types of clones they belong to, we recruited three experts who have experience in code clone research and four students who have experience in programming for manual inspection. For confidence of the data, the experts consulted multiple code clone detection tools to make the consensus. To assist manual inspection, we built a tool to automatically load function pairs of interest and record the manual inspection results. We found that none of the 66 pairs are textual identical type-1 clones, and 9 pairs are type-4 clones. Our data is available at

show abstract

“…The core of the algorithm uses a hash function to generate simhash values. Among various non-cryptographic hash functions we use Jenkin hash function since it shows better similarity preserving behaviour compared to other functions and also found effective in detecting nearmiss code fragments in other studies [1], [29], [30]. We generate a 64 bit simhash value for both context and content using the simhash algorithm [25].…”

Section: Generate Candidate Listmentioning

confidence: 99%

LHDiff: A Language-Independent Hybrid Approach for Tracking Source Code Lines

Asaduzzaman

Roy

Schneider

et al. 2013

2013 IEEE International Conference on Software Maintenance

Self Cite

View full text Add to dashboard Cite

Tracking source code lines between two different versions of a file is a fundamental step for solving a number of important problems in software maintenance such as locating bug introducing changes, tracking code fragments or defects across versions, merging file versions, and software evolution analysis. Although a number of such approaches are available in the literature, their performance is sensitive to the kind and degree of source code changes. There is also a marked lack of study on the effect of change types on source location tracking techniques. In this paper, we propose a language-independent technique, LHDiff, for tracking source code lines across versions that leverages simhash technique together with heuristics to improve accuracy. We evaluate our approach against state-of-theart techniques using benchmarks containing different degrees of changes where files are selected from real world applications. We further evaluate LHDiff with other techniques using a mutation based analysis to understand how different types of changes affect their performance. The results reveal that our technique is more effective than language-independent approaches and no worse than some language-dependent techniques. In our study LHDiff even shows better performance than a state-of-the-art languagedependent approach. In addition, we also discuss limitations of different line tracking techniques including ours and propose future research directions.

show abstract

SimCad: An extensible and faster clone detection tool for large scale software systems

Cited by 26 publications

References 7 publications

Detection of code clones in software generators

Detection of code clones in software generators

A code clone oracle

LHDiff: A Language-Independent Hybrid Approach for Tracking Source Code Lines

Contact Info

Product

Resources

About