CoaCor: Code Annotation for Code Retrieval with Reinforcement Learning

Yao, Zhiliang; Peddamail, Jayavardhan Reddy; Sun, Huan

doi:10.1145/3308558.3313632

Cited by 92 publications

(113 citation statements)

References 57 publications

(145 reference statements)

Supporting

Mentioning

111

Contrasting

Order By: Relevance

“…For example, the BLEU score [24] of the code summarization model in CoaCor is not satisfactory though it improves the performance of existing code retrieval models significantly. Different from what claimed in [34], we respectively argue that generating summaries close to human-provided queries is naturally fit to code retrieval. The compromise of BLEU score, which represents the similarity between the generated summaries and human-written ones, can be avoided if we can model the inner connection between the two tasks better.…”

Section: Introductioncontrasting

confidence: 79%

“…Two VAEs are trained jointly to reconstruct their inputs as much as possible with regularization that captures the closeness between the latent variables of code and description, which will be used for measuring similarity. Similarly, Yao et al [34] constructed a neural networkbased code annotation model to describe the functionality of an entire code snippet. It produces meaningful words that can be used for code retrieval where these words and a natural language query are projected into a vector space to measure the cosine similarity between them.…”

Section: Related Work 61 Code Retrievalmentioning

confidence: 99%

“…Both VAEs are trained jointly to reconstruct their inputs with regularization that captures the closeness between the latent variables of code and description. CoaCor [34] trained a code summarization model to generate code summaries that can be used for the retrieval task based on reinforcement learning. These approaches, however, have yet effectively leveraged the intrinsic connection between the two tasks.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Leveraging Code Generation to Improve Code Retrieval and Summarization via Dual Learning

Xie

Zhang

et al. 2020

Proceedings of the Web Conference 2020

View full text Add to dashboard Cite

Code summarization generates brief natural language description given a source code snippet, while code retrieval fetches relevant source code given a natural language query. Since both tasks aim to model the association between natural language and programming language, recent studies have combined these two tasks to improve their performance. However, researchers have yet been able to effectively leverage the intrinsic connection between the two tasks as they train these tasks in a separate or pipeline manner, which means their performance can not be well balanced. In this paper, we propose a novel end-to-end model for the two tasks by introducing an additional code generation task. More specifically, we explicitly exploit the probabilistic correlation between code summarization and code generation with dual learning, and utilize the two encoders for code summarization and code generation to train the code retrieval task via multi-task learning. We have carried out extensive experiments on an existing dataset of SQL and Python, and results show that our model can significantly improve the results of the code retrieval task over the-state-of-art models, as well as achieve competitive performance in terms of BLEU score for the code summarization task.

show abstract

Section: Introductioncontrasting

confidence: 79%

Section: Related Work 61 Code Retrievalmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Leveraging Code Generation to Improve Code Retrieval and Summarization via Dual Learning

Xie

Zhang

et al. 2020

Proceedings of the Web Conference 2020

View full text Add to dashboard Cite

show abstract

“…Several of the latest code search techniques that find code given a natural language query rely on machine learning techniques (e.g.,NCS [10], DeepCS [8], UNIF [38], MMAN [39], TBCAA [40], and CoaCor [41]). NCS proposes an enhanced word embedding for a natural language query [10].…”

Section: Code Search Systemsmentioning

confidence: 99%

“…This technique aims to capture semantics by incorporating API call information into ASTs which is otherwise abstracted as the same AST node type. CoaCor [41] uses reinforcement learning to build a code annotation framework for effective code retrieval. By generating detailed code annotations using multiple keywords, CoaCor improves the performance of existing code retrieval models.…”

Section: Code Search Systemsmentioning

confidence: 99%

FACER: An API Usage-based Code-example Recommender for Opportunistic Reuse

Abid

Shamail

Basit

et al. 2021

Preprint

View full text Add to dashboard Cite

To save time, developers often search for code examples that implement their desired software features. Existing code search techniques typically focus on ﬁnding code snippets for a single given query, which means that developers need to perform a separate search for each desired functionality. In this paper, we pro-pose FACER (Feature-driven API usage-based Code Examples Recommender), a technique that avoids repeated searches through opportunistic reuse. Speciﬁcally, given the selected code snippet that matches the initial search query, FACER ﬁnds and suggests related code snippets that represent features that the developer may want to implement next. FACER ﬁrst constructs a code fact repository by parsing the source code of open-source Java projects to obtain methods’ textual information, call graphs, and Application Programming Interface (API) usages. It then detects unique features by clustering methods based on similar API us-ages, where each cluster represents a feature or functionality. Finally, it detects frequently co-occurring features across projects using frequent pattern mining and recommends related methods from the mined patterns. To evaluate FACER, we run it on 120 Java Android apps from GitHub. We ﬁrst manually validate that the detected method clusters represent methods with similar functionality. We then perform an automated evaluation to determine the best parameters (e.g., similarity threshold) for FACER. We recruit 10 professional developers along with 39 experienced students to judge FACER’s recommendation of related methods. Our results show that, on average, FACER’s recommendations are 80% precise. We also survey a total of 20 professional Android and Java developers to understand their code search and reuse experiences, and also to obtain their feedback on the usability and usefulness of FACER. The survey results show that 95% of our surveyed professional developers ﬁnd the idea of related method recommendations useful during code reuse.

show abstract

Transformer‐based code search for software Q&A sites

Peng

Xie

et al. 2022

J Software Evolu Process

View full text Add to dashboard Cite

In software Q&A sites, there are many code‐solving examples of individual program problems, and these codes with explanatory natural language descriptions are easy to understand and reuse. Code search in software Q&A sites increases the productivity of developers. However, previous approaches to code search fail to capture structural code information and the interactivity between source codes and natural queries. In other words, most of them focus on specific code structures only. This paper proposes TCS (Transformer‐based code search), a novel neural network, to catch structural information for searching valid source codes from the query, which is vital for code search. The multi‐head attention mechanism in Transformer helps TCS learn enough information about the underlying semantic vector representation of codes and queries. An aligned attention matrix is also employed to catch relationships between codes and queries. Experimental results show that the proposed TCS can learn more structural information and has better performance than existing models.

show abstract

CoaCor: Code Annotation for Code Retrieval with Reinforcement Learning

Cited by 92 publications

References 57 publications

Leveraging Code Generation to Improve Code Retrieval and Summarization via Dual Learning

Leveraging Code Generation to Improve Code Retrieval and Summarization via Dual Learning

FACER: An API Usage-based Code-example Recommender for Opportunistic Reuse

Transformer‐based code search for software Q&A sites

Contact Info

Product

Resources

About