Literature on the topic of code cloning often asserts that duplicating code within a software system is a bad practice, that it causes harm to the system's design and should be avoided. However, in our studies, we have found significant evidence that cloning is often used in a variety of ways as a principled engineering tool. For example, one way to evaluate possible new features for a system is to clone the affected subsystems and introduce the new features there, in a kind of sandbox testbed. As features mature and become stable within the experimental subsystems, they can be migrated incrementally into the stable code base; in this way, the risk of introducing instabilities in the stable version is minimized. This paper describes several patterns of cloning that we have observed in our case studies and discusses the advantages and disadvantages associated with using them. We also examine through a case study the frequencies of these clones in two medium-sized open source software systems, the Apache web server and the Gnumeric spreadsheet application. In this study, we found that as many as 71% of the clones could be considered to have a positive impact on the maintainability of the software system.
Current literature on the topic of duplicated (cloned)
Management of duplicated code in software systems is important in ensuring its graceful evolution. Commonly clone detection tools return large numbers of detected clones with little or no information about them, making clone management impractical and unscalable. We have used a taxonomy of clones to augment current clone detection tools in order to increase the user comprehension of duplication of code within software systems and filter false positives from the clone set. We support our arguments by means of 2 case studies, where we found that as much as 53% of clones can be grouped to form Function clones or Partial Function clones and we were able to filter out as many as 65% of clones as false positives from the reported clone pairs.
Code duplication is a well-documented problem in industrial software systems. There has been considerable research into techniques for detecting duplication in software, and there are several effective tools to perform this task. However, there have been few detailed qualitative studies into how cloning actually manifests itself within software systems. This is primarily due to the large result sets that many clonedetection tools return; these result sets are very difficult to manage without complementary tool support that can scale to the size of the problem, and this kind of support does not currently exist. In this paper we present an in-depth case study of cloning in a large software system that is in wide use, the Apache Web server; we provide insights into cloning as it exists in this system, and we demonstrate techniques to manage and make effective use of the large result sets of clone-detection tools. In our case study, we found several interesting types of cloning occurrences, such as 'cloning hotspots', where a single subsystem comprising only 17% of the system code contained 38.8% of the clones. We also found several examples of cloning behavior that were beneficial to the development of the system, in particular cloning as a way to add experimental functionality.(1) facilities to evaluate the overall cloning situation;(2) mechanisms to guide users toward clones that are most relevant to their task; and (3) methods for filtering and refining the analysis of the clones.Each of these criteria is described in more detail below. Overall system evaluationAs a first step in understanding cloning within a software system, regardless of the end goal, maintainers must have an understanding of the cloning from a high level of abstraction. This understanding will allow the user to evaluate the extent and the severity of the duplication in order to estimate the cost and/or necessity of the task.Several mechanisms can be used to evaluate cloning from a high level. Visualization methods, such as scatterplots [1,3,4,12,15], are useful for the discovery of highly related subsystems and high levels of cloning within a subsystem. They are also useful for detecting unusual types of cloning, such as cloning from system libraries to other parts of the software system. Metric-oriented reports, such as reporting the percentage of lines cloned, average length of a clone, etc., are useful for directing users to points in the system where the most cloning is occurring, or where cloning activities are unusually high in relation to subsystem size. Guide and empower the userThe possibly large sets of clones returned by the clone-detection methods make it infeasible to look at each individual clone. There are several ways to direct users toward the clones they seek. Metrics can be used to query the dataset [16]. Some examples of metrics that might be used are the size of the clone, the types of changes made to the clone, and types of external dependencies a code segment has. Such a method can direct users to promising refactoring ...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.