Are the Code Snippets What We Are Searching for? A Benchmark and an Empirical Study on Code Search with Natural-Language Queries

Yan, Shuhan; Yu, Hang; Chen, Yuting; Shen, Beijun; Jiang, Lingxiao

doi:10.1109/saner48275.2020.9054840

Cited by 50 publications

(25 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Next, we trace them in the IJaDataset files, by following their references from the BigCloneBench dataset, and put them in our search corpus list (Tracing). Afterwards, we normalize each clone We do not use comments as they have been reported to be non-reliable and inconsistent source for extracting natural language document [54], [55]. Similarly, software projects can be poorly documented.…”

Section: B Identifier Extractionmentioning

confidence: 99%

Clone-Seeker: Effective Code Clone Search Using Annotations

et al. 2022

View full text Add to dashboard Cite

Source code search plays an important role in software development, e.g. for exploratory development or opportunistic reuse of existing code from a code base. Often, exploration of different implementations with the same functionality is needed for tasks like automated software transplantation, software diversification, and software repair. Code clones, which are syntactically or semantically similar code fragments, are perfect candidates for such tasks. Searching for code clones involves a given search query to retrieve the relevant code fragments. We propose a novel approach called Clone-Seeker that focuses on utilizing clone class features in retrieving code clones. For this purpose, we generate metadata for each code clone in the form of a natural language document. The metadata includes a pre-processed list of identifiers from the code clones augmented with a list of keywords indicating the semantics of the code clone. This keyword list can be extracted from a manually annotated general description of the clone class, or automatically generated from the source code of the entire clone class. This approach helps developers to perform code clone search based on a search query written either as source code terms, or as natural language. With various experiments, we show that (1) Clone-Seeker is effective in finding clones from BigCloneBench dataset by applying code queries and natural language queries; 2) Clone-Seeker has a higher recall when searching for semantic code clones (i.e., Type-4) in BigCloneBench than the state-of-the-art; 3) Clone-Seeker is a generalized technique as it is effective in finding clones in Project CodeNet dataset by applying code queries and natural language queries. 4) Clone-Seeker with manual annotation outperforms other variants in finding clones on the basis of natural language queries.

show abstract

Section: B Identifier Extractionmentioning

confidence: 99%

Clone-Seeker: Effective Code Clone Search Using Annotations

et al. 2022

View full text Add to dashboard Cite

show abstract

“…Therefore, performance measures such as recall are not of major concern, as they are used to identify whether information retrieval systems miss in reporting some relevant result or not. Previously, there were also many researchers, who did not report recall because of the similar nature of the problem ( Keivanloo, Rilling & Zou, 2014 ; Lv et al, 2015 ; Gu, Zhang & Kim, 2018 ; Yan et al, 2020 ). Based on these reasons, we choose MRR, top-k accuracy and precision metrics to determine the performance of our information retrieval system.…”

Section: Empirical Evaluationmentioning

confidence: 99%

Clone-advisor: recommending code tokens and clone methods with deep learning and information retrieval

Hammad

Babur

Basit

et al. 2021

PeerJ Computer Science

View full text Add to dashboard Cite

Software developers frequently reuse source code from repositories as it saves development time and effort. Code clones (similar code fragments) accumulated in these repositories represent often repeated functionalities and are candidates for reuse in an exploratory or rapid development. To facilitate code clone reuse, we previously presented DeepClone, a novel deep learning approach for modeling code clones along with non-cloned code to predict the next set of tokens (possibly a complete clone method body) based on the code written so far. The probabilistic nature of language modeling, however, can lead to code output with minor syntax or logic errors. To resolve this, we propose a novel approach called Clone-Advisor. We apply an information retrieval technique on top of DeepClone output to recommend real clone methods closely matching the predicted clone method, thus improving the original output by DeepClone. In this paper we have discussed and refined our previous work on DeepClone in much more detail. Moreover, we have quantitatively evaluated the performance and effectiveness of Clone-Advisor in clone method recommendation.

show abstract

“…Lucene is a popular search library for the development of various information retrieval solutions because of its scalability, high-performance and efficient search algorithms [42]. It is shown to answer the highest number of queries as compared to other code search approaches [43].…”

Section: Code Search Systemsmentioning

confidence: 99%

FACER: An API Usage-based Code-example Recommender for Opportunistic Reuse

Abid

Shamail

Basit

et al. 2021

Preprint

View full text Add to dashboard Cite

To save time, developers often search for code examples that implement their desired software features. Existing code search techniques typically focus on ﬁnding code snippets for a single given query, which means that developers need to perform a separate search for each desired functionality. In this paper, we pro-pose FACER (Feature-driven API usage-based Code Examples Recommender), a technique that avoids repeated searches through opportunistic reuse. Speciﬁcally, given the selected code snippet that matches the initial search query, FACER ﬁnds and suggests related code snippets that represent features that the developer may want to implement next. FACER ﬁrst constructs a code fact repository by parsing the source code of open-source Java projects to obtain methods’ textual information, call graphs, and Application Programming Interface (API) usages. It then detects unique features by clustering methods based on similar API us-ages, where each cluster represents a feature or functionality. Finally, it detects frequently co-occurring features across projects using frequent pattern mining and recommends related methods from the mined patterns. To evaluate FACER, we run it on 120 Java Android apps from GitHub. We ﬁrst manually validate that the detected method clusters represent methods with similar functionality. We then perform an automated evaluation to determine the best parameters (e.g., similarity threshold) for FACER. We recruit 10 professional developers along with 39 experienced students to judge FACER’s recommendation of related methods. Our results show that, on average, FACER’s recommendations are 80% precise. We also survey a total of 20 professional Android and Java developers to understand their code search and reuse experiences, and also to obtain their feedback on the usability and usefulness of FACER. The survey results show that 95% of our surveyed professional developers ﬁnd the idea of related method recommendations useful during code reuse.

show abstract

Are the Code Snippets What We Are Searching for? A Benchmark and an Empirical Study on Code Search with Natural-Language Queries

Cited by 50 publications

References 37 publications

Clone-Seeker: Effective Code Clone Search Using Annotations

Clone-Seeker: Effective Code Clone Search Using Annotations

Clone-advisor: recommending code tokens and clone methods with deep learning and information retrieval

FACER: An API Usage-based Code-example Recommender for Opportunistic Reuse

Contact Info

Product

Resources

About