Detecting Semantic Conflicts via Automated Behavior Change Detection

Silva, Léuson Da; Borba, Paulo; Mahmood, Wardah; Berger, Thorsten; Moisakis, Joao

doi:10.1109/icsme46990.2020.00026

Cited by 17 publications

(10 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Another direction would be to consider predicting unwanted feature interactions [13], which are special kinds of bugs that, when taken into account, may improve the predictive ability of defect prediction techniques, as well as provide meaningful insights regarding whether some machine learning classifiers perform better than others on specific kinds of bugs. Here, we could apply techniques for identifying variability-aware bugs [79], or automatically generate test cases for features and use them as partial specifications [80] to identify unwanted feature behavior.…”

Section: Discussionmentioning

confidence: 99%

Feature-Oriented Defect Prediction: Scenarios, Metrics, and Classifiers

Mukelabai,

Strüder,

Strüber

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Software defects are a major nuisance in software development and can lead to considerable financial losses or reputation damage for companies. To this end, a large number of techniques for predicting software defects, largely based on machine learning methods, has been developed over the past decades. These techniques usually rely on code-structure and process metrics to predict defects at the granularity of typical software assets, such as subsystems, components, and files. In this paper, we systematically investigate feature-oriented defect prediction: predicting defects at the granularity of features-domain-entities that abstractly represent software functionality and often cross-cut software assets. Feature-oriented prediction can be beneficial, since: (i) particular features might be more error-prone than others, (ii) characteristics of features known as defective might be useful to predict other error-prone features, and (iii) feature-specific code might be especially prone to faults arising from feature interactions. We explore the feasibility and solution space for feature-oriented defect prediction. We design and investigate scenarios, metrics, and classifiers. Our study relies on 12 software projects from which we analyzed 13,685 bug-introducing and corrective commits, and systematically generated 62,868 training and test datasets to evaluate the designed classifiers, metrics, and scenarios. The datasets were generated based on the 13,685 commits, 81 releases, and 24, 532 permutations of our 12 projects depending on the scenario addressed. We covered scenarios, such as just-in-time (JIT) and cross-project defect prediction. Our results confirm the feasibility of feature-oriented defect prediction. We found the best performance (i.e., precision and robustness) when using the Random Forest classifier, with process and structure metrics. Surprisingly, we found high performance for single-project JIT (median AUROC ≥ 95 %) and release-level (median AUROC ≥ 90 %) defect prediction-contrary to studies that assert poor performance due to insufficient training data. Lastly, we found that a model trained on release-level data from one of the twelve projects could predict defect-proneness of features in the other eleven projects with median performance of 82 %, without retraining on the target projects. Our results suggest potential for defect-prediction model-reuse across projects, as well as more reliable defect predictions for developers as they modify or release software features.

show abstract

Section: Discussionmentioning

confidence: 99%

Feature-Oriented Defect Prediction: Scenarios, Metrics, and Classifiers

Mukelabai,

Strüder,

Strüber

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…code because of "fear of merge conflict." In relation to this conjecture, several studies have reported that merging diverged code between repositories is very laborious as a result of merge conflicts (Stanciulescu et al 2015;Brun et al 2011;de Souza et al 2003;Perry et al 2001;Sousa et al 2018;Mahmood et al 2020;Silva et al 2020). To this end, it would be interesting for future research to interview the developers of our forks (and further forks) to determine whether the lack of support for cherry picking bug fixes or specific functionality does indeed contribute to the lack of code propagation.…”

Section: Implications For Integration Support Toolsmentioning

confidence: 92%

Reuse and maintenance practices among divergent forks in three software ecosystems

et al. 2022

Self Cite

View full text Add to dashboard Cite

With the rise of social coding platforms that rely on distributed version control systems, software reuse is also on the rise. Many software developers leverage this reuse by creating variants through forking, to account for different customer needs, markets, or environments. Forked variants then form a so-called software family; they share a common code base and are maintained in parallel by same or different developers. As such, software families can easily arise within software ecosystems, which are large collections of interdependent software components maintained by communities of collaborating contributors. However, little is known about the existence and characteristics of such families within ecosystems, especially about their maintenance practices. Improving our empirical understanding of such families will help build better tools for maintaining and evolving such families. We empirically explore maintenance practices in such fork-based software families within ecosystems of open-source software. Our focus is on three of the largest software ecosystems existence today: , , and . We identify and analyze software families that are maintained together and that exist both on the official distribution platform (Google play, , and ) as well as on GitHub , allowing us to analyze reuse practices in depth. We mine and identify 38 software families, 526 software families, and 8,837 software families from the ecosystems of , , and , to study their characteristics and code-propagation practices. We provide scripts for analyzing code integration within our families. Interestingly, our results show that there is little code integration across the studied software families from the three ecosystems. Our studied families also show that techniques of direct integration using git outside of GitHub is more commonly used than GitHub pull requests. Overall, we hope to raise awareness about the existence of software families within larger ecosystems of software, calling for further research and better tools support to effectively maintain and evolve them.

show abstract

“…We found in our evaluation that test suites have limited coverage of dependencies, thus RTS may not be able to find tests relevant for changes in dependencies or have enough test data to build a prediction model for average GitHub projects. Finally, Danglot et al [79] and Da Silva et al [80] investigate the use of search-based methods such as test amplification and automated test generation for detecting semantically conflicting changes. Although search-based methods are effective in reducing false positives and to some degree eliminating false negatives present in static analysis, they are limiting for integration test scenarios such as automated dependency updating.…”

Section: Related Workmentioning

confidence: 99%

“…Although search-based methods are effective in reducing false positives and to some degree eliminating false negatives present in static analysis, they are limiting for integration test scenarios such as automated dependency updating. Da Silva et al [80] found that automated test generation such as Evo-Suite [58] have difficulties in generating effective tests for complex objects with internal or external dependencies.…”

Section: Related Workmentioning

confidence: 99%

Can We Trust Tests To Automate Dependency Updates? A Case Study of Java Projects

Hejderup,

Gousios

2021

Preprint

View full text Add to dashboard Cite

Developers are increasingly using services such as Dependabot to automate dependency updates. However, recent research has shown that developers perceive such services as unreliable, as they heavily rely on test coverage to detect conflicts in updates. To understand the prevalence of tests exercising dependencies, we calculate the test coverage of direct and indirect uses of dependencies in 521 well-tested Java projects. We find that tests only cover 58% of direct and 20% of transitive dependency calls. By creating 1,122,420 artificial updates with simple faults covering all dependency usages in 262 projects, we measure the effectiveness of test suites in detecting semantic faults in dependencies; we find that tests can only detect 47% of direct and 35% of indirect artificial faults on average. To increase reliability, we investigate the use of change impact analysis as a means of reducing false negatives; on average, our tool can uncover 74% of injected faults in direct dependencies and 64% for transitive dependencies, nearly two times more than test suites. We then apply our tool in 22 real-world dependency updates, where it identifies three semantically conflicting cases and five cases of unused dependencies. Our findings indicate that the combination of static and dynamic analysis should be a requirement for future dependency updating systems.

show abstract

Detecting Semantic Conflicts via Automated Behavior Change Detection

Cited by 17 publications

References 34 publications

Feature-Oriented Defect Prediction: Scenarios, Metrics, and Classifiers

Feature-Oriented Defect Prediction: Scenarios, Metrics, and Classifiers

Reuse and maintenance practices among divergent forks in three software ecosystems

Can We Trust Tests To Automate Dependency Updates? A Case Study of Java Projects

Contact Info

Product

Resources

About