Deep learning the semantics of change sequences for query expansion

Huang, Qing; Yang, Yang; Cheng, Ming

doi:10.1002/spe.2736

Cited by 20 publications

(9 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Therefore, two studies [38,137] asked researchers or developers to manually annotate the ground-truth in the codebase, which requires the codebase scale to be amenable to these limited manual efforts. To mitigate issues when using manual efforts, nine studies designed a measurement to score the query-code relevancy (e.g., leveraging a clone detection method to score the similarity between a search code and an example code [46,109]) and determined relevancy if the score is larger than a pre-defined threshold. However, choosing the right threshold is difficult and researchers have tried their best to simulate manual identification.…”

Section: Evaluation Methodsmentioning

confidence: 99%

“…These venues include a total of 56 studies, 69.1% of the total reviewed studies. These publication venues publish various kinds of code search studies: studies that propose new tools (46), empirical studies (9), case study (1). We can also observe that among these 17 venues, the top-5 popular conferences these works were published are MSR, ICSE, ASE, FSE, and EMSE; meanwhile, the top-5 journals are TSE, TOSEM, SPE, ASEJ, and TSC.…”

Section: Publication Venues and Contribution Typesmentioning

confidence: 99%

See 1 more Smart Citation

Opportunities and Challenges in Code Search Tools

Liu,

Xia,

et al. 2020

Preprint

View full text Add to dashboard Cite

Code search is a core software engineering task. Effective code search tools can help developers substantially improve their software development efficiency and effectiveness. In recent years, many code search studies have leveraged different techniques, such as deep learning and information retrieval approaches, to retrieve expected code from a large-scale codebase. However, there is a lack of a comprehensive comparative summary of existing code search approaches. To understand the research trends in existing code search studies, we systematically reviewed 81 relevant studies. We investigated the publication trends of code search studies, analyzed key components, such as codebase, query, and modeling technique used to build code search tools, and classified existing tools into focusing on supporting seven different search tasks. Based on our findings, we identified a set of outstanding challenges in existing studies and a research roadmap for future code search research.CCS Concepts: • Software and its engineering → Search-based software engineering.

show abstract

Section: Evaluation Methodsmentioning

confidence: 99%

Section: Publication Venues and Contribution Typesmentioning

confidence: 99%

Opportunities and Challenges in Code Search Tools

Liu,

Xia,

et al. 2020

Preprint

View full text Add to dashboard Cite

show abstract

“…Chen et al proposed a programming language independent method for code pattern recognition based on code patterns extracted from Stack Overflow. Some related work distilled crowd knowledge on Stack Overflow to improve query‐expansion‐based code search. Nie et al proposed QECK, which performs query reformulation by applying BM25 model to mine text‐processed PRF documents (query‐related software repositories) from StackOverflow.…”

Section: Related Workmentioning

confidence: 99%

“…Most of these above works only refer to one or two social features. Another work, Zheng et al mined the software repositories using the term frequency similarity, which extracts the code snippets to share the most terms with another comment segment. There is also much work that takes into account code characteristics.…”

Section: Related Workmentioning

confidence: 99%

Unsupervised software repositories mining and its application to code search

Peng

Yihan

et al. 2019

Softw Pract Exp

View full text Add to dashboard Cite

Software repositories are crucial resources for many software tasks, including code retrieval and annotation. Programming forums provide questions and answers (Q&A) from software developers, containing abundant code-description posts for exchanging knowledge about programming issues. However, most posts provide personal opinions of users that are often not adequately confirmed or outdated. Mining software repositories in such open and unrestricted forums is challenging. Since the posts can be arbitrary and noisy, it is difficult to get unified labels for supervised noise elimination. Different from existing mining approaches, this paper proposes Code-Description Mining Framework (CodeMF), an unsupervised framework to eliminate noisy posts and extract high quality software repositories from programming forums.CodeMF treats all social features of the posts as discrete-time signals for kernel principal component analysis and further performs wavelet transform feature fusion to find the delicate changes (noises in temporal signals). We conduct comprehensive experiments on StackOverflow. Experimental results demonstrate that CodeMF can effectively reduce running time and improve precision via mining high-quality software repositories for various programming languages, especially for the large-scale codebases. To further illustrate the effect of CodeMF applied in software tasks, we introduce it to improve the performance of query-expansion code search. Meanwhile, for SQL and C# programs, compared to the state-of-the-art query-expansion method QECK, the improvement of QECK CodeMF is 2% and 6% on Recall@10, and 4% and 14% on mean reciprocal rank, respectively.

show abstract

“…ItiChaturvedi et al proposed a Variable-order Belief Network (VBN) framework, which is good at modeling word dependencies in text, can be used for semantic representation of words [ 38 ]. Similarly, Huang et al [ 39 ] used the deep belief network (DBN) model to capture the meaningful terms for effective query expansion in the code searching task. The model both extracts relevant terms to expand a query and excludes irrelevant terms from the query and outperforms several query expansion algorithms for code search.…”

Section: Introductionmentioning

confidence: 99%

Using NLP in openEHR archetypes retrieval to promote interoperability: a feasibility study in China

Sun

Zhang

et al. 2021

BMC Med Inform Decis Mak

View full text Add to dashboard Cite

Background With the development and application of medical information system, semantic interoperability is essential for accurate and advanced health-related computing and electronic health record (EHR) information sharing. The openEHR approach can improve semantic interoperability. One key improvement of openEHR is that it allows for the use of existing archetypes. The crucial problem is how to improve the precision and resolve ambiguity in the archetype retrieval. Method Based on the query expansion technology and Word2Vec model in Nature Language Processing (NLP), we propose to find synonyms as substitutes for original search terms in archetype retrieval. Test sets in different medical professional level are used to verify the feasibility. Result Applying the approach to each original search term (n = 120) in test sets, a total of 69,348 substitutes were constructed. Precision at 5 (P@5) was improved by 0.767, on average. For the best result, the P@5 was up to 0.975. Conclusions We introduce a novel approach that using NLP technology and corpus to find synonyms as substitutes for original search terms. Compared to simply mapping the element contained in openEHR to an external dictionary, this approach could greatly improve precision and resolve ambiguity in retrieval tasks. This is helpful to promote the application of openEHR and advance EHR information sharing.

show abstract

Deep learning the semantics of change sequences for query expansion

Cited by 20 publications

References 27 publications

Opportunities and Challenges in Code Search Tools

Opportunities and Challenges in Code Search Tools

Unsupervised software repositories mining and its application to code search

Using NLP in openEHR archetypes retrieval to promote interoperability: a feasibility study in China

Contact Info

Product

Resources

About