Context
Tangled commits are changes to software that address multiple concerns at once. For researchers interested in bugs, tangled commits mean that they actually study not only bugs, but also other concerns irrelevant for the study of bugs.
Objective
We want to improve our understanding of the prevalence of tangling and the types of changes that are tangled within bug fixing commits.
Methods
We use a crowd sourcing approach for manual labeling to validate which changes contribute to bug fixes for each line in bug fixing commits. Each line is labeled by four participants. If at least three participants agree on the same label, we have consensus.
Results
We estimate that between 17% and 32% of all changes in bug fixing commits modify the source code to fix the underlying problem. However, when we only consider changes to the production code files this ratio increases to 66% to 87%. We find that about 11% of lines are hard to label leading to active disagreements between participants. Due to confirmed tangling and the uncertainty in our data, we estimate that 3% to 47% of data is noisy without manual untangling, depending on the use case.
Conclusion
Tangled commits have a high prevalence in bug fixes and can lead to a large amount of noise in the data. Prior research indicates that this noise may alter results. As researchers, we should be skeptics and assume that unvalidated data is likely very noisy, until proven otherwise.
Predictive mutation testing (PMT) is a technique to predict whether a mutant is killed, using machine learning approaches. Researchers have proposed various methods for PMT over the years. However, the impact of unreached mutants on PMT is not fully addressed. A mutant is unreached if the statement on which the mutant is generated is not executed by any test cases. We aim at showing that unreached mutants can inflate PMT results. Moreover, we propose an alternative approach to PMT, suggesting a different interpretation for PMT. To this end, we replicated the previous PMT research. We empirically evaluated the suggested approach on 654 Java projects provided by prior literature. Our results indicate that the performance of PMT drastically decreases in terms of area under a receiver operating characteristic curve (AUC) from 0.833 to 0.517. Furthermore, PMT performs worse than random guesses on 27% of the projects. The proposed approach improves the PMT results, achieving the average AUC value of 0.613. As a result, we recommend researchers to remove unreached mutants when reporting the results.
Web robots also known as crawlers or spiders are used by search engines, hackers and spammers to gather information about web pages. Timely detection and prevention of unwanted crawlers increases privacy and security of websites. In this paper, a novel method to identify web crawlers is proposed to prevent unwanted crawler to access websites. This new method suggests Five-factor identification process to detect unwanted crawlers. This work provides the pretest and posttest results along with a systematic evaluation of web pages with the proposed identification technique versus web pages without the proposed identification process. The outputs of logistic regression analysis for both treatment and control groups are provided to evaluate hypotheses and to answer the research questions. An experiment is performed with repeated measures for two groups with each group containing the same web pages. The main goal of this work was to address the challenge of identifying and preventing unwanted web crawlers by proposing a novel defense mechanism with identification process.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.