Automated techniques have been proposed to either identify refactoring opportunities (i.e., code fragments that can be but have not yet been restructured in a program), or reconstruct historical refactorings (i.e., code restructuring operations that have happened between different versions of a program). In this paper, we propose a new technique that can detect both refactoring opportunities and historical refactorings in large code bases. The key of our technique is the design of vector abstraction and concretization operations that can encode code changes induced by certain refactorings as characteristic vectors. Thus, the problem of identifying refactorings can be reduced to the problem of identifying matching vectors, which can be solved efficiently. We have implemented our technique for Java. The prototype is applied to 200 bundle projects from the Eclipse ecosystem containing 4.5 million lines of code, and reports in total more than 32K instances of 17 types of refactoring opportunities, taking 25 minutes on average for each type. The prototype is also applied to 14 versions of 3 smaller programs (JMeter, Ant, XML-Security), and detects (1) more than 2.8K refactoring opportunities within individual versions with a precision of about 87%, and (2) more than 190 historical refactorings across consecutive versions of the programs with a precision of about 92%.
Refactoring is an important way to improve the design of existing code. Identifying refactoring opportunities (i.e., code fragments that can be refactored) in large code bases is a challenging task. In this paper, we propose a novel, automated and scalable technique for identifying cross-function refactoring opportunities that span more than one function (e.g., Extract Method and Inline Method). The key of our technique is the design of efficient vector inlining operations that emulate the effect of method inlining among code fragments, so that the problem of identifying cross-function refactoring can be reduced to the problem of finding similar vectors before and after inlining. We have implemented our technique in a prototype tool named ReDex which encodes Java programs to particular vectors. We have applied the tool to a large code base, 4.5 million lines of code, comprising of 200 bundle projects in the Eclipse ecosystem (e.g., Eclipse JDT, Eclipse PDE, Apache Commons, Hamcrest, etc.). Also, different from many other studies on detecting refactoring, ReDex only searches for code fragments that can be, but have not yet been, refactored in a way similar to some refactoring that happened in the code base. Our results show that ReDex can find 277 cross-function refactoring opportunities in 2 minutes, and 223 cases were labelled as true opportunities by users, and cover many categories of cross-function refactoring operations in classical refactoring books, such as Self Encapsulate Field, Decompose Conditional Expression, Hide Delegate, Preserve Whole Object, etc.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.