A large-scale empirical exploration on refactoring activities in open source software projects

Vassallo, Carmine; Grano, Giovanni; Palomba, Fabio; Gall, Harald C.; Bacchelli, Alberto

doi:10.1016/j.scico.2019.05.002

Cited by 40 publications

(27 citation statements)

References 53 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…All the aforementioned work relate refactoring actions to quality attributes, such as metrics, code smells, or to process indicators (as Vassallo et al [48] did), whereas our study relates refactoring actions to bug introduction, while considering the effect of some change metrics (i.e., change size) as a co-factor. Our study allowed to (partially) corroborate previous findings reported in the literature [8].…”

Section: Related Workmentioning

confidence: 99%

“…Vassallo et al [48] mined 200 systems to quantitatively investigate factors correlating with refactoring, looking at when, why, and by whom refactoring is performed. Their results show that refactorings (i) are rarely performed close to a new release; (ii) are mainly performed while improving existing features; and (iii) are mainly done by the owners of the code components being refactored.…”

Section: Related Workmentioning

confidence: 99%

“…Software refactoring has been extensively studied by the research community, through empirical studies investigating how and why developers perform refactoring [32,37,39,43,48,49], how refactoring relates with other development tasks (e.g., merge conflicts [35]), with software quality indicators (e.g., quality metrics) [5,17,45,46], and with developers' productivity [36]. Some studies (e.g., Kim et al.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

On the relationship between refactoring actions and bugs: a differentiated replication

Penta

Bavota

Zampetti

2020

Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Softw

View full text Add to dashboard Cite

Software refactoring aims at improving code quality while preserving the system's external behavior. Although in principle refactoring is a behavior-preserving activity, a study presented by Bavota et al. in 2012 reported the proneness of some refactoring actions (e.g., pull up method) to induce faults. The study was performed by mining refactoring activities and bugs from three systems. Taking profit of the advances made in the mining software repositories field (e.g., better tools to detect refactoring actions at commit-level granularity), we present a differentiated replication of the work by Bavota et al. in which we (i) overcome some of the weaknesses that affect their experimental design, (ii) answer the same research questions of the original study on a much larger dataset (3 vs 103 systems), and (iii) complement the quantitative analysis of the relationship between refactoring and bugs with a qualitative, manual inspection of commits aimed at verifying the extent to which refactoring actions trigger bug-fixing activities. The results of our quantitative analysis confirm the findings of the replicated study, while the qualitative analysis partially demystifies the role played by refactoring actions in the bug introduction. CCS CONCEPTS • Software and its engineering → Software reliability; Designing software.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

On the relationship between refactoring actions and bugs: a differentiated replication

Penta

Bavota

Zampetti

2020

Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Softw

View full text Add to dashboard Cite

show abstract

“…Tsantalis et al [30] automatically detect refactorings in commit history; however, their approach is currently limited to fine-grained analysis of classical refactorings, supports only Java, which is problematic for multilanguage ML systems, and does not correlate technical debt. Kim et al [36] study refactoring challenges and benefits at Microsoft, while Vassallo et al [69] perform a large-scale refactoring study on open-source software, and Murphy-Hill et al [70] study general refactoring at the IDE level. Sousa et al [71] characterize composite refactorings, Hora and Robbes [72] explore the characteristics of method extraction refactorings, Peruma et al [73] investigate refactorings of unit tests in Android, and Bavota et al [60] and Ferreira et al [74] study fault inducing refactoring activities.…”

Section: Re L a T E D W O R Kmentioning

confidence: 99%

An Empirical Study of Refactorings and Technical Debt in Machine Learning Systems

Tang

Khatchadourian

Bagherzadeh

et al. 2021

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

View full text Add to dashboard Cite

Machine Learning (ML), including Deep Learning (DL), systems, i.e., those with ML capabilities, are pervasive in today's data-driven society. Such systems are complex; they are comprised of ML models and many subsystems that support learning processes. As with other complex systems, ML systems are prone to classic technical debt issues, especially when such systems are long-lived, but they also exhibit debt specific to these systems. Unfortunately, there is a gap of knowledge in how ML systems actually evolve and are maintained. In this paper, we fill this gap by studying refactorings, i.e., source-to-source semanticspreserving program transformations, performed in real-world, open-source software, and the technical debt issues they alleviate. We analyzed 26 projects, consisting of 4.2 MLOC, along with 327 manually examined code patches. The results indicate that developers refactor these systems for a variety of reasons, both specific and tangential to ML, some refactorings correspond to established technical debt categories, while others do not, and code duplication is a major crosscutting theme that particularly involved ML configuration and model code, which was also the most refactored. We also introduce 14 and 7 new ML-specific refactorings and technical debt categories, respectively, and put forth several recommendations, best practices, and anti-patterns. The results can potentially assist practitioners, tool developers, and educators in facilitating long-term ML system usefulness.Index Terms-empirical studies, refactoring, machine learning systems, technical debt, software repository mining I. In t r o d u c t io nIn the big data era, Machine Learning (ML), including Deep Learning (DL), systems are pervasive in modern society. Central to these systems are dynamic ML models, whose behavior is ultimately defined by their input data. However, such systems do not only consist of ML models; instead, ML systems typically encompass complex subsystems that support ML processes [1]. ML systems-like other long-lived, complex systems-are prone to classic technical debt [2 ] issues; yet, they also exhibit debt specific to such systems [3]. While work exist on applying software engineering (SE) rigor to ML systems [4]- [12], there is generally a gap of knowledge in how ML systems actually evolve and are maintained. As ML systems become more difficult and expensive to maintain [1 ], understanding the kinds of modifications developers are required to make to such systems-our overarching research question-is of the utmost importance.To fill this gap, we performed an empirical study on common refactorings, i.e., source-to-source semantics preserving program transformations-a widely accepted mechanism for effectively reducing technical debt [13]-[16]-in real-world, open-source ML systems. We set out to discover (i) the kinds of refactorings-both specific and tangential to ML-performed, (ii) whether particular refactorings occurred more often in model code vs. other supporting subsystems, (iii) the types of technical debt ...

show abstract

“…Recent findings have shown how developers' experience constitutes a key factor to carefully consider during maintenance tasks [2], [3], [8]- [10], [32], [33]. For instance, Bhatt et al [10] found that human and organizational factors such as organization climate, customer attitude, and developers' experience have a significant influence on software maintenance effort.…”

Section: B On the Use Of Developers' Experiencementioning

confidence: 99%

How the Experience of Development Teams Relates to Assertion Density of Test Classes

Catolino

Palomba

Zaidman³

et al. 2019

2019 IEEE International Conference on Software Maintenance and Evolution (ICSME)

Self Cite

View full text Add to dashboard Cite

A large-scale empirical exploration on refactoring activities in open source software projects

Cited by 40 publications

References 53 publications

On the relationship between refactoring actions and bugs: a differentiated replication

On the relationship between refactoring actions and bugs: a differentiated replication

An Empirical Study of Refactorings and Technical Debt in Machine Learning Systems

How the Experience of Development Teams Relates to Assertion Density of Test Classes

Contact Info

Product

Resources

About