Automated localization of buggy files can accelerate developers' efficiency of software maintenance, improving the quality of software products. State-of-the-art approaches for bug localization is based on neural networks, e.g., RNN or CNN, and can learn semantic feature from the given bug report. However, these simple neural architectures are difficult to learn the deep contextual feature from bug reports, which hurts the semantic mapping between bug reports and their corresponding buggy files. To resolve the above problem, in this paper we propose a bug localization approach that combines pre-trained language models and contrastive learning, namely CoLoc. Specifically, CoLoc first is pre-trained on a large-scale bug report corpus in an unsupervised way, to learn the deep contextual feature of each token in the bug report according to its context. Afterward, CoLoc is further pre-trained by a contrastive learning objective to learn the contrastive learning representations both of bug reports and buggy files. Contrastive learning can help CoLoc to learn the semantic differences between different bug reports and buggy files. To evaluate the effectiveness of CoLoc, we choose five baseline approaches and compare their performance on a public dataset. The experimental results show that CoLoc outperforms all baseline approaches and achieves new results for bug localization.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.