Clone Detection via Structural Abstraction

Evans, William S.; Fraser, Christopher W.; Ma, Fei

doi:10.1109/wcre.2007.15

Cited by 39 publications

(38 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the present paper anti-unifiers are built in top-down manner by enlarging clusters and generalizing their anti-unifiers. It is difficult to compare the method of finding patterns, proposed in [4] and the method of building clusters from the current paper. But the anti-unification based approach is more flexible, because it is based on general notions such as distance between two statements and an "average value" of a set of statements.…”

Section: Comparison With Existing Approachesmentioning

confidence: 99%

“…We follow the approach of [4] using the notion of d-cap. The d-cap of a tree is obtained by replacing all subtrees of the level d and all leaves by placeholders.…”

Section: A Partitioning Similar Statements Into Clustersmentioning

confidence: 99%

“…The paper [4] is pioneering by performing fully structural abstraction rather than lexical one. For example, structural abstraction allows to catch the similarity between a[x] and a[y+1] using the tree pattern a [?].…”

Section: Comparison With Existing Approachesmentioning

confidence: 99%

“…The authors have developed comparison methods and benchmarks and have tested collection of tools in the same conditions. To our knowledge there are two tools that work on the abstract syntax tree level, CloneDR [8] and Asta [4]. Unfortunately, these tools are not publicly available and therefore we cannot compare all tools on the same program text corpora.…”

Section: Duplicate Code Detection Toolmentioning

confidence: 99%

“…A fully syntactic abstraction in duplicate clone detection is first reported in [4]. Their algorithm detects a similarity between, e.g., a [1] and a[x+1] by reducing them to the pattern a [?].…”

mentioning

confidence: 99%

See 4 more Smart Citations

Duplicate Code Detection Using Anti-Unification

Bulychev¹,

Minea²

2008

Proceedings of the Spring/Summer Young Researchers' Colloquium on Software Engineering

View full text Add to dashboard Cite

Abstract-This paper describes a new algorithm for finding software clones. It is conceptually independent of the source language of the analyzed programs, working at the level of abstract syntax trees. The algorithm considers that two sequences of statements form a clone if one of them can be obtained from the other by replacing some subtrees. To our knowledge this notion was not previously employed in the literature. It allows to take into account all information on the syntactic structure of a program. We have implemented this algorithm in the tool Clone Digger. It currently supports the Python and Java languages. Clone Digger is free and provided under the GPL license.I. INTRODUCTION Different researchers report that the amount of duplicate code in software systems varies from 6.4% -7.5% to 13% -20% [1]. Duplicate code can occur as a result of approaches to development and maintenance, due to language or programmer limitations, or simply by accident [1]. Code duplication can be a significant drawback, leading to bad design, and increased probability of bug occurrence and propagation. As a result, it can significantly increase maintenance cost (for instance, any bug in the original has to be fixed in all duplicates), and form a barrier for software evolution. Consequently, duplicate code detectors are a useful class of software analysis tools. Such tools can aid in measuring the quality of software systems and in the process of refactoring. Techniques for detecting duplicate code can be classified according to several criteria. Code can be viewed as similar based on syntactic criteria or at a semantic level (from the point of view of execution effects). In this paper we consider only syntactic similarity. Within this category, duplicate clone detection can be performed at different levels of granularity: strings, tokens, abstract syntax trees, feature vectors [1]. The first two are quite rigid and lowlevel, therefore we use an approach based on abstract syntax trees.Two sequences of statements form duplicate code if they are similar enough according to a selected measure of similarity. Such measures can be defined using a set of allowed editing operations and their cost. According to [1] there are three different types of syntactic changes: adding/removing of whitespaces and comments, changing names of variables, and more complex modifications. We aim to detect a wide range of clones, including the third type: e.g., expressions with similar structure.In essence, we wish to characterize the structural similarity of two code fragments in order to determine whether they should be classified as code duplicates. We can formalize this by using the concept of anti-unifier, which denotes the

show abstract

Section: Comparison With Existing Approachesmentioning

confidence: 99%

“…We follow the approach of [4] using the notion of d-cap. The d-cap of a tree is obtained by replacing all subtrees of the level d and all leaves by placeholders.…”

Section: A Partitioning Similar Statements Into Clustersmentioning

confidence: 99%

Section: Comparison With Existing Approachesmentioning

confidence: 99%

Section: Duplicate Code Detection Toolmentioning

confidence: 99%

mentioning

confidence: 99%

See 3 more Smart Citations

Duplicate Code Detection Using Anti-Unification

Bulychev¹,

Minea²

2008

Proceedings of the Spring/Summer Young Researchers' Colloquium on Software Engineering

View full text Add to dashboard Cite

show abstract

An empirical study on how project context impacts on code cloning

Pérez‐Castillo

Piattini

2018

J Software Evolu Process

View full text Add to dashboard Cite

Code cloning can seriously affect software quality. Code clones are various fragments of syntactically or semantically equivalent code. Some authors argue that code clones have a negative impact on maintainability and understandability, since clones propagate defects and make it mandatory to pay attention to several copies. However, other authors believe clones are not necessarily bad, since self‐admitted clones favor system stability and allow developers to move projects forward. Although some root causes and effects of cloning have been widely studied, there is not much relevant work analyzing how certain projects context factors impact on code cloning. This work presents an empirical validation of six open source projects by considering certain factors from Git repositories measured throughout a total of 70 releases for the 6 systems. The factors analyzed were the number of commits and committers per release, the average size of the commits and the size of the system in each release. The main conclusion obtained from the study is that, while the number of commits and committers and the system size do not significantly affect cloning, larger commits lead to a higher cloning ratio. These insights contribute to predicting and preventing code cloning, thus enabling a software quality improvement.

show abstract

Software smell detection techniques: A systematic literature review

AbuHassan

Alshayeb

Ghouti

2020

J Software Evolu Process

View full text Add to dashboard Cite

Software smells indicate design or code issues that might degrade the evolution and maintenance of software systems. Detecting and identifying these issues are challenging tasks. This paper explores, identifies, and analyzes the existing software smell detection techniques at design and code levels. We carried out a systematic literature review (SLR) to identify and collect 145 primary studies related to smell detection in software design and code. Based on these studies, we address several questions related to the analysis of the existing smell detection techniques in terms of abstraction level (design or code), targeted smells, used metrics, implementation, and validation. Our analysis identified several detection techniques categories. We observed that 57% of the studies did not use any performance measures, 41% of them omitted details on the targeted programing language, and the detection techniques were not validated in 14% of these studies. With respect to the abstraction level, only 18% of the studies addressed bad smell detection at the design level. This low coverage urges for more focus on bad smell detection at the design level to handle them at early stages. Finally, our SLR brings to the attention of the research community several opportunities for future research.

show abstract

Clone Detection via Structural Abstraction

Cited by 39 publications

References 25 publications

Duplicate Code Detection Using Anti-Unification

Duplicate Code Detection Using Anti-Unification

An empirical study on how project context impacts on code cloning

Software smell detection techniques: A systematic literature review

Contact Info

Product

Resources

About