Just because software developers say they believe in "X", that does not necessarily mean that "X" is true. As shown here, there exist numerous beliefs listed in the recent Software Engineering literature which are only supported by small portions of the available data. Hence we ask what is the source of this disconnect between beliefs and evidence?.To answer this question we look for evidence for ten beliefs within 300,000+ changes seen in dozens of open-source projects. Some of those beliefs had strong support across all the projects; specifically, "A commit that involves more added and removed lines is more bug-prone" and "Files with fewer lines contributed by their owners (who contribute most changes) are bug-prone".Most of the widely-held beliefs studied are only sporadically supported in the data; i.e. large effects can appear in project data and then disappear in subsequent releases. Such sporadic support explains why developers believe things that were relevant to their prior work, but not necessarily their current work.Our conclusion will be that we need to change the nature of the debate with Software Engineering. Specifically, while it is important to report the effects that hold right now, it is also important to report on what effects change over time. CCS CONCEPTS• Software and its engineering → Maintaining software.
In this paper we study the trustworthiness of the crowd for crowdsourced software development. Through the study of literature from various domains, we present the risks that impact the trustworthiness in an enterprise context. We survey known techniques to mitigate these risks. We also analyze key metrics from multiple years of empirical data of actual crowdsourced software development tasks from two leading vendors. We present the metrics around untrustworthy behavior and the performance of certain mitigation techniques. Our study and results can serve as guidelines for crowdsourced enterprise software development.
Many researchers assume that, for software analytics, "more data is better." We write to show that, at least for learning defect predictors, this may not be true.To demonstrate this, we analyzed hundreds of popular GitHub projects. These projects ran for 84 months and contained 3,728 commits (median values). Across these projects, most of the defects occur very early in their life cycle. Hence, defect predictors learned from the first 150 commits and four months perform just as well as anything else. This means that, at least for the projects studied here, after the first few months, we need not continually update our defect prediction models.We hope these results inspire other researchers to adopt a "simplicity-first" approach to their work. Some domains require a complex and data-hungry analysis. But before assuming complexity, it is prudent to check the raw data looking for "short cuts" that can simplify the analysis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.