Context
Tangled commits are changes to software that address multiple concerns at once. For researchers interested in bugs, tangled commits mean that they actually study not only bugs, but also other concerns irrelevant for the study of bugs.
Objective
We want to improve our understanding of the prevalence of tangling and the types of changes that are tangled within bug fixing commits.
Methods
We use a crowd sourcing approach for manual labeling to validate which changes contribute to bug fixes for each line in bug fixing commits. Each line is labeled by four participants. If at least three participants agree on the same label, we have consensus.
Results
We estimate that between 17% and 32% of all changes in bug fixing commits modify the source code to fix the underlying problem. However, when we only consider changes to the production code files this ratio increases to 66% to 87%. We find that about 11% of lines are hard to label leading to active disagreements between participants. Due to confirmed tangling and the uncertainty in our data, we estimate that 3% to 47% of data is noisy without manual untangling, depending on the use case.
Conclusion
Tangled commits have a high prevalence in bug fixes and can lead to a large amount of noise in the data. Prior research indicates that this noise may alter results. As researchers, we should be skeptics and assume that unvalidated data is likely very noisy, until proven otherwise.
Modeling and implementing auction systems using agent technology is a common practice because agents can assume various roles and their behavior will be determined as a result of negotiation. However, emergent behavior is a hurdle. Mechanisms must be in place to make sure that agents participating in the auction systems won't behave in an unintended way. Detecting emergent behaviors in the design phase rather than the deployment is more cost and effort efficient. Patterns of interaction, called scenarios, are the basic modeling constructs for design and behavioral modeling of agents. However, working with several agents in an online auction system needs large number of scenarios. Therefore transforming scenarios to finite state machines (FSM) and parallel execution of the FSMs in the behavioral synthesis phase may lead to computational overload. So far all the research has been around the ways of detecting emergent behavior and scalability of behavioral modeling has been an issue. In this paper a method to identify those agents that will not cause emergent behavior is introduced. Then by eliminating them from the behavioral modeling phase, the number of FSMs and their states will be reduced. The method is explained along with a case study of a realistic online auction system that has led to 33% reduction of synthesized FSMs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.