Retrieval from software libraries for bug localization

Rao, Shivani; Kak, Avinash C.

doi:10.1145/1985441.1985451

Cited by 199 publications

(26 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We could not find a bug dataset for C# projects, like iBUGS (Dallmeier and Zimmermann 2016) or moreBugs (Rao and Kak 2013b). Then, we used GitHub search functionality 2 to obtain a list of large C# projects, by searching for projects with 1000 or more stars and 100 or more forks.…”

Section: Project Selectionmentioning

confidence: 99%

“…To foster the process of effectively identifying source code that is relevant to a particular bug report, a number of techniques have been developed using information retrieval (IR) models such as Latent Dirichlet Allocation (LDA) (Lukins et al 2010), Latent Semantic Analysis (LSA) (Rao and Kak 2011), and Vector Space Model (VSM) (Zhou et al 2012). The IR approach to bug localization generally consists of treating source files as documents, against which a query, represented by the bug report, will be run.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

On the influence of program constructs on bug localization effectiveness

Garnier¹,

Ferreira²,

Garcia³

2017

J Softw Eng Res Dev

View full text Add to dashboard Cite

Software projects often reach hundreds or thousands of files. Therefore, manually searching for code elements that should be changed to fix a failure is a difficult task. Static bug localization techniques provide cost-effective means of finding files related to the failure described in a bug report. Structured information retrieval (IR) has been successfully applied by techniques such as BLUiR, BLUiR+, and AmaLgam. However, there are significant shortcomings on how these techniques were evaluated. First, virtually all evaluations have been limited to very few projects written in only one object-oriented programming language, particularly Java. Second, it might be that particular constructs of different programming languages, such as C#, play a role on the effectiveness of bug localization techniques. However, little is known about this phenomenon. Third, the experimental setup for most of the bug localization studies make simplistic assumptions that do not hold on real-world scenarios, thereby raising doubts about the reported effectiveness of existing techniques. In this article, we evaluate BLUiR, BLUiR+, and AmaLgam on 20 C# projects, addressing the aforementioned shortcomings from previous studies. Then, we extend AmaLgam's algorithm to understand if structured information retrieval can benefit from the use of a wider range of program constructs, including C# constructs inexistent in Java. We also perform an analysis of the influence of program constructs to bug localization effectiveness using Principal Component Analysis (PCA). Our analysis points to Methods and Classes as the constructs that contribute the most to the effectiveness of bug localization. It also reveals a significant contribution from Properties and String literals, constructs not considered in previous studies. Finally, we evaluate the effects of changing the emphasis on particular constructs by making another extension to AmaLgam's algorithm, enabling the specification of different weights for each construct. Our results show that fine-tuning these weights may increase the effectiveness of bug localization in projects structured with a specific programming language, such as C#.

show abstract

Section: Project Selectionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

On the influence of program constructs on bug localization effectiveness

Garnier¹,

Ferreira²,

Garcia³

2017

J Softw Eng Res Dev

View full text Add to dashboard Cite

show abstract

“…Similar to our previous work [10], we have used the moreBugs [23] dataset to perform our experimental validation. The dataset contains all the necessary information to evaluate both the batch-mode and the incremental-mode approaches to IR based bug localization, namely: (a) the commit-level changes taking place in the repository; (b) the release history of the software; and (c) a set of closed/resolved issues/bugs.…”

Section: Experimental Validation 41 the Datasetmentioning

confidence: 99%

“…We also present strategies for retraining the model after a sequence of commits or for large commits (commits that affect a significant portion of the source code) in order to keep the incrementally updated model close to the true model. In order to evaluate our incremental model update framework, we have created a benchmark dataset called moreBugs [23] that tracks commit-level changes over 10 years of developmental history of two software repositories: JodaTime and AspectJ. …”

Section: Introductionmentioning

confidence: 99%

Comparing Incremental Latent Semantic Analysis Algorithms for Efficient Retrieval from Software Libraries for Bug Localization

Rao

Medeiros

Kak

2015

SIGSOFT Softw. Eng. Notes

Self Cite

View full text Add to dashboard Cite

The problem of bug localization is to identify the source files related to a bug in a software repository. Information Retrieval (IR) based approaches create an index of the source files and learn a model which is then queried with a bug for the relevant files. In spite of the advances in these tools, the current approaches do not take into consideration the dynamic nature of software repositories. With the traditional IR based approaches to bug localization, the model parameters must be recalculated for each change to a repository. In contrast, this paper presents an incremental framework to update the model parameters of the Latent Semantic Analysis (LSA) model as the data evolves. We compare two state-of-the-art incremental SVD update techniques for LSA with respect to the retrieval accuracy and the time performance. The dataset we used in our validation experiments was created from mining 10 years of version history of AspectJ and JodaTime software libraries.

show abstract

“…We have therefore created a new and publicly available benchmark dataset called moreBugs [22] by mining ten years of commit history for AspectJ and JodaTime projects. …”

Section: Experimental Validation a The Evaluation Datasetmentioning

confidence: 99%

An incremental update framework for efficient retrieval from software libraries for bug localization

Rao

Medeiros

Kak

2013

2013 20th Working Conference on Reverse Engineering (WCRE)

Self Cite

View full text Add to dashboard Cite

Abstract-Information Retrieval (IR) based bug localization techniques use a bug reports to query a software repository to retrieve relevant source files. These techniques index the source files in the software repository and train a model which is then queried for retrieval purposes. Much of the current research is focused on improving the retrieval effectiveness of these methods. However, little consideration has been given to the efficiency of such approaches for software repositories that are constantly evolving. As the software repository evolves, the index creation and model learning have to be repeated to ensure accuracy of retrieval for each new bug. In doing so, the query latency may be unreasonably high, and also, re-computing the index and the model for files that did not change is computationally redundant. We propose an incremental update framework to continuously update the index and the model using the changes made at each commit. We demonstrate that the same retrieval accuracy can be achieved but with a fraction of the time needed by current approaches. Our results are based on two basic IR modeling techniques -Vector Space Model (VSM) and Smoothed Unigram Model (SUM). The dataset we used in our validation experiments was created by tracking commit history of AspectJ and JodaTime software libraries over a span of 10 years.

show abstract

Retrieval from software libraries for bug localization

Cited by 199 publications

References 26 publications

On the influence of program constructs on bug localization effectiveness

On the influence of program constructs on bug localization effectiveness

Comparing Incremental Latent Semantic Analysis Algorithms for Efficient Retrieval from Software Libraries for Bug Localization

An incremental update framework for efficient retrieval from software libraries for bug localization

Contact Info

Product

Resources

About