Measuring Coding Challenge Competence With APPS

Hendrycks, Dan; Basart, Steven; Kadavath, Saurav; Mazeika, Mantas; Arora, Akul; Guo, Ethan; Burns, Collin; Puranik, Samir; He, Horace; Song, Dawn; Steinhardt, Jacob

doi:10.48550/arxiv.2105.09938

Cited by 26 publications

(53 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The explainability categories we identified have varied technical feasibility with current techniques, and point to topics that are under-explored for generative AI. For example, for the Performance category, existing works have used the Computational Accuracy metric to evaluate generative code models [9,15,33,88], but not other metrics we uncovered regarding the characteristics of the generated artifacts and run-time efficiency. To understand performance differences and limitations with regard to different types of input, solutions have been explored for natural language generation under Prompt Engineering [57,58].…”

Section: Discussion 61 Informing Xai Approaches For Genai For Codementioning

confidence: 99%

Investigating Explainability of Generative AI for Code through Scenario-based Design

Sun

Liao²,

Müller

et al. 2022

27th International Conference on Intelligent User Interfaces

110

View full text Add to dashboard Cite

What does it mean for a generative AI model to be explainable? The emergent discipline of explainable AI (XAI) has made great strides in helping people understand discriminative models. Less attention has been paid to generative models that produce artifacts, rather than decisions, as output. Meanwhile, generative AI (GenAI) technologies are maturing and being applied to application domains such as software engineering. Using scenario-based design and question-driven XAI design approaches, we explore users' explainability needs for GenAI in three software engineering use cases: natural language to code, code translation, and code autocompletion. We conducted 9 workshops with 43 software engineers in which real examples from state-of-the-art generative AI models were used to elicit users' explainability needs. Drawing from prior work, we also propose 4 types of XAI features for GenAI for code and gathered additional design ideas from participants. Our work explores explainability needs for GenAI for code and demonstrates how human-centered approaches can drive the technical development of XAI in novel domains. CCS CONCEPTS• Computing methodologies → Natural language generation; • Software and its engineering; • Human-centered computing → Human computer interaction (HCI); User studies;

show abstract

Section: Discussion 61 Informing Xai Approaches For Genai For Codementioning

confidence: 99%

Investigating Explainability of Generative AI for Code through Scenario-based Design

Sun

Liao²,

Müller

et al. 2022

27th International Conference on Intelligent User Interfaces

110

View full text Add to dashboard Cite

show abstract

“…Large-scale natural language modeling has witnessed rapid advances since the inception of the Transformer architecture [46]. It has been shown by recent works that large language models (LLMs) pre-trained on large unstructured text corpus not only can perform strongly on various down-stream NLP tasks [10,33,34,5] but the learned representations can also be used to model relations of entities [20], retrieve matching visual features [17], synthesize code from docstrings [13,7], solve math problems [8,39], and even as valuable priors when applied to diverse tasks from different modalities [23,45]. Notably, by pre-training on large-scale data, these models can also internalize an implicit knowledge base containing rich information about the world from which factual answers (e.g.…”

Section: Related Workmentioning

confidence: 99%

Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents

Huang¹,

Abbeel²,

Pathak³

et al. 2022

Preprint

View full text Add to dashboard Cite

Can world knowledge learned by large language models (LLMs) be used to act in interactive environments? In this paper, we investigate the possibility of grounding high-level tasks, expressed in natural language (e.g. "make breakfast"), to a chosen set of actionable steps (e.g. "open fridge"). While prior work focused on learning from explicit step-by-step examples of how to act, we surprisingly find that if pre-trained LMs are large enough and prompted appropriately, they can effectively decompose high-level tasks into low-level plans without any further training. However, the plans produced naively by LLMs often cannot map precisely to admissible actions. We propose a procedure that conditions on existing demonstrations and semantically translates the plans to admissible actions. Our evaluation in the recent VirtualHome environment shows that the resulting method substantially improves executability over the LLM baseline. The conducted human evaluation reveals a trade-off between executability and correctness but shows a promising sign towards *Equal advising. Correspondence to Wenlong Huang

show abstract

“…To bypass this limitation, Roziere et al (2020) used unsupervised neural machine translation techniques to translate between languages using only monolingual corpora, and showed impressive results for translation between Java, C++, and Python. While Roziere et al (2020) trained the model specifically for code translation, large language models -such as GPT-2 (Radford et al, 2019), GPT-3 (Brown et al, 2020), and Codex -have also been shown to have some competence in generating code (Hendrycks et al, 2021).…”

Section: Related Workmentioning

confidence: 99%

Using Document Similarity Methods to create Parallel Datasets for Code Translation

Agarwal¹,

Talamadupula²,

Martínez³

et al. 2021

Preprint

View full text Add to dashboard Cite

Translating source code from one programming language to another is a critical, time-consuming task in modernizing legacy applications and codebases. Recent work in this space has drawn inspiration from the software naturalness hypothesis by applying natural language processing techniques towards automating the code translation task. However, due to the paucity of parallel data in this domain, supervised techniques have only been applied to a limited set of popular programming languages. To bypass this limitation, unsupervised neural machine translation techniques have been proposed to learn code translation using only monolingual corpora. In this work, we propose to use document similarity methods to create noisy parallel datasets of code, thus enabling supervised techniques to be applied for automated code translation without having to rely on the availability or expensive curation of parallel code datasets. We explore the noise tolerance of models trained on such automatically-created datasets and show that these models perform comparably to models trained on ground truth for reasonable levels of noise. Finally, we exhibit the practical utility of the proposed method by creating parallel datasets for languages beyond the ones explored in prior work, thus expanding the set of programming languages for automated code translation.

show abstract

Measuring Coding Challenge Competence With APPS

Cited by 26 publications

References 26 publications

Investigating Explainability of Generative AI for Code through Scenario-based Design

Investigating Explainability of Generative AI for Code through Scenario-based Design

Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents

Using Document Similarity Methods to create Parallel Datasets for Code Translation

Contact Info

Product

Resources

About