A Model of the Commit Size Distribution of Open Source

Kolassa, Carsten; Riehle, Dirk; Salim, Michel A.

doi:10.1007/978-3-642-35843-2_6

Cited by 19 publications

(15 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Inspection by hand shows that many times, code is being committed in large chunks. This is uncommon in traditional open source software development, where the most frequent commit size is one line of code [16]. Thus, it is safe to assume that these small but still active projects are being developed in-house and are being provided in a snapshot-style to the public at appropriate times.…”

Section: Project Classificationmentioning

confidence: 99%

Paid vs. Volunteer Work in Open Source

Riehle

Riemer

Kolassa

et al. 2014

2014 47th Hawaii International Conference on System Sciences

Self Cite

View full text Add to dashboard Cite

Many open source projects have long become commercial. This paper shows just how much of open source software development is paid work and how much has remained volunteer work. Using a conservative approach, we find that about 50% of all open source software development has been paid work for many years now and that many small projects are fully paid for by companies. However, we also find that any non-trivial project balances the amount of paid developer with volunteer work, and we suggest that the ratio of volunteer to paid work can serve as an indicator for the health of open source projects and aid the management of the respective communities. Index Terms-Open source software, empirical software engineering, volunteer open source, paid open source.

show abstract

Section: Project Classificationmentioning

confidence: 99%

Paid vs. Volunteer Work in Open Source

Riehle

Riemer

Kolassa

et al. 2014

2014 47th Hawaii International Conference on System Sciences

Self Cite

View full text Add to dashboard Cite

show abstract

“…The success of these approaches depends on the accuracy of feature location techniques which are often still low [32]. Also, the quality of the code comments and commits information can be poor due to outdated comments [19], unavailability of authorship information for authors without commit rights in CVS and SVN repositories [23], etc. In this work, we focus on analyzing textual information available in bug reports to recommend appropriate fixers.…”

Section: Automated Bug Triagingmentioning

confidence: 99%

Improving Automated Bug Triaging with Specialized Topic Model

Xia

Ding

et al. 2017

IIEEE Trans. Software Eng.

157

View full text Add to dashboard Cite

Bug triaging refers to the process of assigning a bug to the most appropriate developer to fix. It becomes more and more difficult and complicated as the size of software and the number of developers increase. In this paper, we propose a new framework for bug triaging, which maps the words in the bug reports (i.e., the term space) to their corresponding topics (i.e., the topic space). We propose a specialized topic modeling algorithm named multi-feature topic model (MTM) which extends Latent Dirichlet Allocation (LDA) for bug triaging. MTM considers product and component information of bug reports to map the term space to the topic space. Finally, we propose an incremental learning method named TopicMiner which considers the topic distribution of a new bug report to assign an appropriate fixer based on the affinity of the fixer to the topics. We pair TopicMiner with MTM (TopicMiner M T M ). We have evaluated our solution on 5 large bug report datasets including GCC, OpenOffice, Mozilla, Netbeans, and Eclipse containing a total of 227,278 bug reports. We show that TopicMiner M T M can achieve top-1 and top-5 prediction accuracies of 0.4831 -0.6868, and 0.7686 -0.9084, respectively. We also compare TopicMiner M T M with Bugzie, LDA-KL, SVM-LDA, LDA-Activity, and Yang et al.'s approach. The results show that TopicMiner M T M on average improves top-1 and top-5 prediction accuracies of Bugzie by 128.48% and 53.22%, LDA-KL by 262.91% and 105.97%, SVM-LDA by 205.89% and 110.48%, LDA-Activity by 377.60% and 176.32%, and Yang et al.'s approach by 59.88% and 13.70%, respectively.

show abstract

“…Third, file-or class-level granularity could be too large to learn patterns of transformation. Finally, considering arbitrarily long snippets of code, such as hunks in diffs, could make the learning more difficult given the variability in size and context [45], [46]. Note that we consider each TP as an independent fix, meaning that multiple methods modified in the same bug fixing activity are considered independently from one other.…”

Section: B Analysis Of Transformation Pairsmentioning

confidence: 99%

Learning How to Mutate Source Code from Bug-Fixes

Tufano

Watson

Bavota

et al. 2019

2019 IEEE International Conference on Software Maintenance and Evolution (ICSME)

View full text Add to dashboard Cite

Mutation testing has been widely accepted as an approach to guide test case generation or to assess the effectiveness of test suites. Empirical studies have shown that mutants are representative of real faults; yet they also indicated a clear need for better, possibly customized, mutation operators and strategies. While methods to devise domain-specific or generalpurpose mutation operators from real faults exist, they are effortand error-prone, and do not help the tester to decide whether and how to mutate a given source code element. We propose a novel approach to automatically learn mutants from faults in real programs. First, our approach processes bug fixing changes using fine-grained differencing, code abstraction, and change clustering. Then, it learns mutation models using a deep learning strategy. We have trained and evaluated our technique on a set of ∼787k bug fixes mined from GitHub. Our empirical evaluation showed that our models are able to predict mutants that resemble the actual fixed bugs in between 9% and 45% of the cases, and over 98% of the automatically generated mutants are lexically and syntactically correct.Index Terms-mutation testing, deep learning, neural networks • A novel approach for learning how to mutate code from bug-fixes.• Empirical evidence that our models are able to learn diverse mutation operators that are closely related to real bugs.• Data and source code to enable replication [38].

show abstract

A Model of the Commit Size Distribution of Open Source

Cited by 19 publications

References 14 publications

Paid vs. Volunteer Work in Open Source

Paid vs. Volunteer Work in Open Source

Improving Automated Bug Triaging with Specialized Topic Model

Learning How to Mutate Source Code from Bug-Fixes

Contact Info

Product

Resources

About