Exploiting spatial code proximity and order for improved source code retrieval for bug localization

Sisman, Bunyamin; Akbar, Shayan A.; Kak, Avinash C.

doi:10.1002/smr.1805

Cited by 16 publications

(8 citation statements)

References 45 publications

(101 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Prior work has shown that MRF's positional proximity significantly outperforms DLM's simplified model for the problem of bug localization applied at the file level [12]. From this initial study, we see that the problem of feature location at the method level does not show the same consistent improvement from positional proximity as bug localization.…”

Section: Discussionmentioning

confidence: 73%

“…Note that the model constant λ F I has no impact on the rankings with the FI assumption. However, we use this parameter in SD model together with λ SD to combine the scores obtained with the 2-node cliques and the 3-node cliques by enforcing λ F I +λ SD = 1 [12].…”

Section: A Markov Random Fields (Mrf)mentioning

confidence: 99%

“…In prior work, we have applied positional proximity using Markov Random Fields (MRF) to improve the effectiveness of bug localization [12]. MRF modeling was used previously by Metzler and Croft [13] as a means to improving the performance of IR algorithms for retrieval from text corpora.…”

Section: A Markov Random Fields (Mrf)mentioning

confidence: 99%

“…MRF modeling was used previously by Metzler and Croft [13] as a means to improving the performance of IR algorithms for retrieval from text corpora. As presented in [12], [13], with MRF, you model the interterm dependencies for retrievals by constructing a dependency graph G that contains one node for the method being evaluated for its relevance to the query, with the other nodes representing the query terms. Typically, we denote the node that stands for the method by m and the other nodes by the query terms Q = {q 1 , q 2 , ..., q |Q| }.…”

Section: A Markov Random Fields (Mrf)mentioning

confidence: 99%

See 3 more Smart Citations

On the use of positional proximity in IR-based feature location

Hill

Sisman

Kak

2014

2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE)

Self Cite

View full text Add to dashboard Cite

As software systems continue to grow and evolve, locating code for software maintenance tasks becomes increasingly difficult. Recently proposed approaches to bug localization and feature location have suggested using the positional proximity of words in the source code files and the bug reports to determine the relevance of a file to a query. Two different types of approaches have emerged for incorporating word proximity and order in retrieval: those based on ad-hoc considerations and those based on Markov Random Field (MRF) modeling. In this paper, we explore using both these types of approaches to identify over 200 features in five open source Java systems. In addition, we use positional proximity of query words within natural language (NL) phrases in order to capture the NL semantics of positional proximity. As expected, our results indicate that the power of these approaches varies from one dataset to another. However, the variations are larger for the ad-hoc positional-proximity based approaches than with the approach based on MRF. In other words, the feature location results are more consistent across the datasets with MRF based modeling of the features.

show abstract

Section: Discussionmentioning

confidence: 73%

Section: A Markov Random Fields (Mrf)mentioning

confidence: 99%

Section: A Markov Random Fields (Mrf)mentioning

confidence: 99%

Section: A Markov Random Fields (Mrf)mentioning

confidence: 99%

See 2 more Smart Citations

On the use of positional proximity in IR-based feature location

Hill

Sisman

Kak

2014

2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE)

Self Cite

View full text Add to dashboard Cite

show abstract

“…Probabilistic models for natural language can also be used to analyze computer languages or computer code [97]. As an example, in the paper "Mining concepts from code with probabilistic topic models" the authors analyze large-scale code repositories with the aim of increased understanding of software code structure and functionality, while also enabling increased code reuse and code refactoring [66].…”

Section: Applicationsmentioning

confidence: 99%

Scalable and Efficient Probabilistic Topic Model Inference for Textual Data

Magnusson¹

View full text Add to dashboard Cite

Probabilistic topic models have proven to be an extremely versatile class of mixed-membership models for discovering the thematic structure of text collections. There are many possible applications, covering a broad range of areas of study: technology, natural science, social science and the humanities.In this thesis, a new efficient parallel Markov Chain Monte Carlo inference algorithm is proposed for Bayesian inference in large topic models. The proposed methods scale well with the corpus size and can be used for other probabilistic topic models and other natural language processing applications. The proposed methods are fast, efficient, scalable, and will converge to the true posterior distribution.In addition, in this thesis a supervised topic model for high-dimensional text classification is also proposed, with emphasis on interpretable document prediction using the horseshoe shrinkage prior in supervised topic models.Finally, we develop a model and inference algorithm that can model agenda and framing of political speeches over time with a priori defined topics. We apply the approach to analyze the evolution of immigration discourse in the Swedish parliament by combining theory from political science and communication science with a probabilistic topic model. iv AcknowledgmentsThere are many people that I need to thank for their direct and indirect contributions to this thesis. People who have given their support and personal contributions, and also some that just put up with me through these five, very intensive, years.First and foremost I want to thank my main supervisor Mattias Villani. It has been a privilege to be his student and I really want to thank him for all the ideas, time, and effort he put into me throughout the years. He has always pushed me to go further, accepting nothing less than high quality research from me. But he also helped me focus on the right things when so many exciting research projects were possible.My co-supervisor Marco Kuhlmann has also been important during these years, helping me through the difficulties of Natural language processing and computational linguistics. Marco's advice and counseling has been invaluable to me.I am also very grateful to David Mimno, who welcomed me to Cornell University and acted as my supervisor during the fall 2016. Doing research at Cornell for one semester really helped me to get different perspectives on the latent semantic analysis research field. The way I try to present the different parts of latent semantic analysis in this thesis is heavily influenced by discussions with David and David's course on advanced topic models.The most important part of my graduates studies has been learning to be a researcher. I entered graduate school, knowing very little about how to do statistical research, especially in the field of probabilistic text modeling and natural language processing. But thanks to my many collaborators I now feel like I can actually do real research. My research collaborators on different projects have been extremely imp...

show abstract

Guidelines for evaluating bug‐assignment research

Sajedi‐Badashian

Stroulia

2020

J Software Evolu Process

View full text Add to dashboard Cite

Bug assignment is the task of ranking candidate developers in terms of their potential competence to fix a bug report. Numerous methods have been developed to address this task, relying on different methodological assumptions and demonstrating their effectiveness with a variety of empirical studies with numerous data sets and evaluation criteria. Despite the importance of the subject and the attention it has received from researchers, there is still no unanimity on how to validate and comparatively evaluate bug‐assignment methods and, often times, methods reported in the literature are not reproducible. In this paper, we first report on our systematic review of the broad bug‐assignment research field. Next, we focus on a few key empirical studies and review their choices with respect to three important experimental‐design parameters, namely, the evaluation metric(s) they report, their definition of who the real assignee is, and the community of developers they consider as candidate assignees. The substantial variability on these criteria led us to formulate a systematic experiment to explore the impact of these choices. We conducted our experiment on a comprehensive data set of bugs we collected from 13 long‐term open‐source projects, using a simple Tf‐IDf similarity metric. On the basis of our arguments and/or experiments, we provide useful guidelines for performing further bug‐assignment research. We conclude that mean average precision (MAP) is the most informative evaluation metric, the developer community should be defined as “all the project members,” and the real assignee should be defined as “any developer who worked toward fixing a bug.”

show abstract

Exploiting spatial code proximity and order for improved source code retrieval for bug localization

Cited by 16 publications

References 45 publications

On the use of positional proximity in IR-based feature location

On the use of positional proximity in IR-based feature location

Scalable and Efficient Probabilistic Topic Model Inference for Textual Data

Guidelines for evaluating bug‐assignment research

Contact Info

Product

Resources

About