Improved bug localization based on code change histories and bug reports

Youm, Klaus Changsun; Ahn, June; Lee, Eunseok

doi:10.1016/j.infsof.2016.11.002

Cited by 92 publications

(78 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Fine‐grained code changes are highly repetitive. It sets forth a new direction in mining code changes to actively help users during development (code completion, code refactoring or bug fixes, etc). In 2016, Nguyen et al took advantage of the repetitiveness of fine‐grained code changes to improve the code completion.…”

Section: Related Workmentioning

confidence: 99%

Query expansion based on statistical learning from code changes

Huang

Yang

Xue³

et al. 2018

Softw Pract Exp

View full text Add to dashboard Cite

Thesaurus-based, code-related, and software-specific query expansion techniques are the main contributions in free-form query search. However, these techniques still could not put the most relevant query result in the first position because they lack the ability to infer the expansion words that represent the user needs based on a given query. In this paper, we discover that code changes can imply what users want and propose a novel query expansion technique with code changes (QECC). It exploits (changes, contexts) pairs from changed methods. On the basis of statistical learning from pairs, it can infer code changes for a given query. In this way, it expands a query with code changes and recommends the query results that meet actual needs perfectly. In addition, we implement InstaRec to perform QECC and evaluate it with 195 039 change commits from GitHub and our code tracker. The results show that QECC can improve the precision of 3 code search algorithms (ie, IR, Portfolio, and VF) by up to 52% to 62% and outperform the state-of-the-art query expansion techniques (ie, query expansion based on crowd knowledge and CodeHow) by 13% to 16% when the top 1 result is inspected. KEYWORDScode changes, code search, information retrieval, software reuse, statistical learning, query expansion INTRODUCTIONAs code repositories (eg, CodePlex, * GitHub, † and SourceForge ‡ ) become available, 1 code search has become a common activity during software development. 2,3 Especially, users are more interested in the free-form query search, which allows users to type natural language keywords to define queries. 4 The performance of this search strongly depends on word matches between queries and query results. However, queries and query results do not often use the same words. 5 Even the length of a query is usually short. Sadowski et al reported that the average number of words per query is 1.85 for the queries proposed to Google search. 6 Obviously, it is not an easy task to formulate a good query. This motivates the query expansion techniques. 7,8 Earlier, WordNet 9 reformulates a query with synonyms in a word thesaurus. However, Lu et al 10 showed that the general English-based similarity measurements of WordNet could not effectively suggest similar words

show abstract

Section: Related Workmentioning

confidence: 99%

Query expansion based on statistical learning from code changes

Huang

Yang

Xue³

et al. 2018

Softw Pract Exp

View full text Add to dashboard Cite

show abstract

“…The BLIA [17] tool localizes bugs on the levels of a file and of a method. The authors utilized the revision history, file contents, bug reports with comments and stack traces to find suspicious files.…”

Section: Related Workmentioning

confidence: 99%

“…This process can be additionally enhanced by extracting stack traces from bug reports [11]. More complex systems use a composition of existing algorithms, by using linear combinations of ranking scores [15], [16], [17] or by using learning to rank algorithms [12], [13].…”

Section: Introductionmentioning

confidence: 99%

Tracking Buggy Files: New Efficient Adaptive Bug Localization Algorithm

Fejzer

Narębski

Przymus

et al. 2022

IIEEE Trans. Software Eng.

View full text Add to dashboard Cite

Upon receiving a new bug report, developers need to find its cause in the source code. Bug localization can be helped by a tool that ranks all source files according to how likely they include the bug. This problem was thoroughly examined by numerous scientists. We introduce a novel adaptive bug localization algorithm. The concept behind it is based on new feature weighting approaches and an adaptive selection algorithm utilizing pointwise learn-to-rank method. The algorithm is evaluated on publicly available datasets, and is competitive in terms of accuracy and required computational resources compared to state-of-the-art. Additionally, to improve reproducibility we provide extended datasets that include computed features and partial steps, and we also provide the source code.

show abstract

“…For example, to fix the buggy program, e.g., multiple buggy lines, the fault localization technique should be performed. Many researchers have proposed techniques to find buggy code lines using the information retrieval model [21] and Latent Dirichlet Allocation [22]. However, the aims of these fault localization studies are different from those of fault localization for automatic fault repair, in the sense that all the correctly identified lines are used together for repair.…”

Section: Fault Localizationmentioning

confidence: 99%

Applying Genetic Programming with Similar Bug Fix Information to Automatic Fault Repair

et al. 2018

View full text Add to dashboard Cite

Owing to the high complexity of recent software products, developers cannot avoid major/minor mistakes, and software bugs are generated during the software development process. When developers manually modify a program source code using bug descriptions to fix bugs, their daily workloads and costs increase. Therefore, we need a way to reduce their workloads and costs. In this paper, we propose a novel automatic fault repair method by using similar bug fix information based on genetic programming (GP). First, we searched for similar buggy source codes related to the new given buggy code, and then we searched for a fixed the buggy code related to the most similar source code. Next, we transformed the fixed code into abstract syntax trees for applying GP and generated the candidate program patches. In this step, we verified the candidate patches by using a fitness function based on given test cases to determine whether the patch was valid or not. Finally, we produced program patches to fix the new given buggy code.

show abstract

Improved bug localization based on code change histories and bug reports

Cited by 92 publications

References 10 publications

Query expansion based on statistical learning from code changes

Query expansion based on statistical learning from code changes

Tracking Buggy Files: New Efficient Adaptive Bug Localization Algorithm

Applying Genetic Programming with Similar Bug Fix Information to Automatic Fault Repair

Contact Info

Product

Resources

About