2019
DOI: 10.1002/spe.2736
|View full text |Cite
|
Sign up to set email alerts
|

Deep learning the semantics of change sequences for query expansion

Abstract: The overexpansion problem negatively affects the quality of query expansion.To improve the quality of queries for searching code, this paper proposed a DBN-based algorithm for effective query expansion. The deep belief network (DBN) model is trained on the code sequences and their change sequences, which aims to capture the meaningful terms during the evolution of source code. In contrast to previous studies, the proposed model not only extracts relevant terms to expand a query but also excludes irrelevant ter… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 20 publications
(9 citation statements)
references
References 27 publications
0
9
0
Order By: Relevance
“…Therefore, two studies [38,137] asked researchers or developers to manually annotate the ground-truth in the codebase, which requires the codebase scale to be amenable to these limited manual efforts. To mitigate issues when using manual efforts, nine studies designed a measurement to score the query-code relevancy (e.g., leveraging a clone detection method to score the similarity between a search code and an example code [46,109]) and determined relevancy if the score is larger than a pre-defined threshold. However, choosing the right threshold is difficult and researchers have tried their best to simulate manual identification.…”
Section: Evaluation Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Therefore, two studies [38,137] asked researchers or developers to manually annotate the ground-truth in the codebase, which requires the codebase scale to be amenable to these limited manual efforts. To mitigate issues when using manual efforts, nine studies designed a measurement to score the query-code relevancy (e.g., leveraging a clone detection method to score the similarity between a search code and an example code [46,109]) and determined relevancy if the score is larger than a pre-defined threshold. However, choosing the right threshold is difficult and researchers have tried their best to simulate manual identification.…”
Section: Evaluation Methodsmentioning
confidence: 99%
“…These venues include a total of 56 studies, 69.1% of the total reviewed studies. These publication venues publish various kinds of code search studies: studies that propose new tools (46), empirical studies (9), case study (1). We can also observe that among these 17 venues, the top-5 popular conferences these works were published are MSR, ICSE, ASE, FSE, and EMSE; meanwhile, the top-5 journals are TSE, TOSEM, SPE, ASEJ, and TSC.…”
Section: Publication Venues and Contribution Typesmentioning
confidence: 99%
“…Chen et al proposed a programming language independent method for code pattern recognition based on code patterns extracted from Stack Overflow. Some related work distilled crowd knowledge on Stack Overflow to improve query‐expansion‐based code search. Nie et al proposed QECK, which performs query reformulation by applying BM25 model to mine text‐processed PRF documents (query‐related software repositories) from StackOverflow.…”
Section: Related Workmentioning
confidence: 99%
“…Most of these above works only refer to one or two social features. Another work, Zheng et al mined the software repositories using the term frequency similarity, which extracts the code snippets to share the most terms with another comment segment. There is also much work that takes into account code characteristics.…”
Section: Related Workmentioning
confidence: 99%
“…ItiChaturvedi et al proposed a Variable-order Belief Network (VBN) framework, which is good at modeling word dependencies in text, can be used for semantic representation of words [ 38 ]. Similarly, Huang et al [ 39 ] used the deep belief network (DBN) model to capture the meaningful terms for effective query expansion in the code searching task. The model both extracts relevant terms to expand a query and excludes irrelevant terms from the query and outperforms several query expansion algorithms for code search.…”
Section: Introductionmentioning
confidence: 99%