Cory Kapser scite author profile

Literature on the topic of code cloning often asserts that duplicating code within a software system is a bad practice, that it causes harm to the system's design and should be avoided. However, in our studies, we have found significant evidence that cloning is often used in a variety of ways as a principled engineering tool. For example, one way to evaluate possible new features for a system is to clone the affected subsystems and introduce the new features there, in a kind of sandbox testbed. As features mature and become stable within the experimental subsystems, they can be migrated incrementally into the stable code base; in this way, the risk of introducing instabilities in the stable version is minimized. This paper describes several patterns of cloning that we have observed in our case studies and discusses the advantages and disadvantages associated with using them. We also examine through a case study the frequencies of these clones in two medium-sized open source software systems, the Apache web server and the Gnumeric spreadsheet application. In this study, we found that as many as 71% of the clones could be considered to have a positive impact on the maintainability of the software system.

show abstract

"Cloning Considered Harmful" Considered Harmful

Kapser

Godfrey

2006

185

125

View full text Add to dashboard Cite

show abstract

Aiding comprehension of cloning through categorization

Kapser¹,

Godfrey²

View full text Add to dashboard Cite

Management of duplicated code in software systems is important in ensuring its graceful evolution. Commonly clone detection tools return large numbers of detected clones with little or no information about them, making clone management impractical and unscalable. We have used a taxonomy of clones to augment current clone detection tools in order to increase the user comprehension of duplication of code within software systems and filter false positives from the clone set. We support our arguments by means of 2 case studies, where we found that as much as 53% of clones can be grouped to form Function clones or Partial Function clones and we were able to filter out as many as 65% of clones as false positives from the reported clone pairs.

show abstract

Supporting the analysis of clones in software systems

Kapser

Godfrey

2006

J. Softw. Maint. Evol.: Res. Pract.

View full text Add to dashboard Cite

Code duplication is a well-documented problem in industrial software systems. There has been considerable research into techniques for detecting duplication in software, and there are several effective tools to perform this task. However, there have been few detailed qualitative studies into how cloning actually manifests itself within software systems. This is primarily due to the large result sets that many clonedetection tools return; these result sets are very difficult to manage without complementary tool support that can scale to the size of the problem, and this kind of support does not currently exist. In this paper we present an in-depth case study of cloning in a large software system that is in wide use, the Apache Web server; we provide insights into cloning as it exists in this system, and we demonstrate techniques to manage and make effective use of the large result sets of clone-detection tools. In our case study, we found several interesting types of cloning occurrences, such as 'cloning hotspots', where a single subsystem comprising only 17% of the system code contained 38.8% of the clones. We also found several examples of cloning behavior that were beneficial to the development of the system, in particular cloning as a way to add experimental functionality.(1) facilities to evaluate the overall cloning situation;(2) mechanisms to guide users toward clones that are most relevant to their task; and (3) methods for filtering and refining the analysis of the clones.Each of these criteria is described in more detail below. Overall system evaluationAs a first step in understanding cloning within a software system, regardless of the end goal, maintainers must have an understanding of the cloning from a high level of abstraction. This understanding will allow the user to evaluate the extent and the severity of the duplication in order to estimate the cost and/or necessity of the task.Several mechanisms can be used to evaluate cloning from a high level. Visualization methods, such as scatterplots [1,3,4,12,15], are useful for the discovery of highly related subsystems and high levels of cloning within a subsystem. They are also useful for detecting unusual types of cloning, such as cloning from system libraries to other parts of the software system. Metric-oriented reports, such as reporting the percentage of lines cloned, average length of a clone, etc., are useful for directing users to points in the system where the most cloning is occurring, or where cloning activities are unusually high in relation to subsystem size. Guide and empower the userThe possibly large sets of clones returned by the clone-detection methods make it infeasible to look at each individual clone. There are several ways to direct users toward the clones they seek. Metrics can be used to query the dataset [16]. Some examples of metrics that might be used are the size of the clone, the types of changes made to the clone, and types of external dependencies a code segment has. Such a method can direct users to promising refactoring ...

show abstract

Cloning by accident: an empirical study of source code cloning across software systems

Al-Ekram

Kapser

Holt

et al.

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Cory Kapser

“Cloning considered harmful” considered harmful: patterns of cloning in software

"Cloning Considered Harmful" Considered Harmful

Aiding comprehension of cloning through categorization

Supporting the analysis of clones in software systems

Cloning by accident: an empirical study of source code cloning across software systems

Contact Info

Product

Resources

About