CodeScore: Evaluating Code Generation by Learning Code Execution

Dong, Yihong; Ding, Jiazheng; Xue, Jian; Li, Zhuo; Li, Ge; Jin, Zhi

doi:10.48550/arxiv.2301.09043

2023

DOI: 10.48550/arxiv.2301.09043

|View full text |Cite

Preprint

CodeScore: Evaluating Code Generation by Learning Code Execution

Yihong Dong¹,

Jiazheng Ding²,

Jian Xue³

et al.

Abstract: A proper code evaluation metric (CEM) profoundly impacts the evolution of code generation, which is an important research field in NLP and software engineering. Prevailing CEMs can be categorized into match-based CEMs (e.g., BLEU, Accuracy, and Code-BLEU) and execution-based CEMs (e.g., Avg-PassRatio and Pass@k), but both of them suffer from some issues. The former only measures differences in surface form regardless of the functional equivalence of codes, while the latter has huge execution overheads, includi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2024

Publication Types

Select...

Article2

Relationship

Self Cite0

Independent2

Authors

Journals

Cited by 2 publications

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

ClarifyGPT: A Framework for Enhancing LLM-Based Code Generation via Requirements Clarification

Mu,

Shi,

Wang

et al. 2024

Proc. ACM Softw. Eng.

View full text Add to dashboard Cite

Large Language Models (LLMs), such as ChatGPT, have demonstrated impressive capabilities in automatically generating code from provided natural language requirements. However, in real-world practice, it is inevitable that the requirements written by users might be ambiguous or insufficient. Current LLMs will directly generate programs according to those unclear requirements, regardless of interactive clarification, which will likely deviate from the original user intents. To bridge that gap, we introduce a novel framework named ClarifyGPT, which aims to enhance code generation by empowering LLMs with the ability to identify ambiguous requirements and ask targeted clarifying questions. Specifically, ClarifyGPT first detects whether a given requirement is ambiguous by performing a code consistency check. If it is ambiguous, ClarifyGPT prompts an LLM to generate targeted clarifying questions. After receiving question responses, ClarifyGPT refines the ambiguous requirement and inputs it into the same LLM to generate a final code solution. To evaluate our ClarifyGPT, we invite ten participants to use ClarifyGPT for code generation on two benchmarks: MBPP-sanitized and MBPP-ET. The results show that ClarifyGPT elevates the performance (Pass@1) of GPT-4 from 70.96% to 80.80% on MBPP-sanitized. Furthermore, to conduct large-scale automated evaluations of ClarifyGPT across different LLMs and benchmarks without requiring user participation, we introduce a high-fidelity simulation method to simulate user responses. The results demonstrate that ClarifyGPT can significantly enhance code generation performance compared to the baselines. In particular, ClarifyGPT improves the average performance of GPT-4 and ChatGPT across five benchmarks from 62.43% to 69.60% and from 54.32% to 62.37%, respectively. A human evaluation also confirms the effectiveness of ClarifyGPT in detecting ambiguous requirements and generating high-quality clarifying questions. We believe that ClarifyGPT can effectively facilitate the practical application of LLMs in real-world development environments.

show abstract

ClarifyGPT: A Framework for Enhancing LLM-Based Code Generation via Requirements Clarification

Mu,

Shi,

Wang

et al. 2024

Proc. ACM Softw. Eng.

View full text Add to dashboard Cite

show abstract

Large Language Model Supply Chain: A Research Agenda

Wang,

Zhao,

Hou

et al. 2024

ACM Trans. Softw. Eng. Methodol.

View full text Add to dashboard Cite

The rapid advancement of large language models (LLMs) has revolutionized artificial intelligence, introducing unprecedented capabilities in natural language processing and multimodal content generation. However, the increasing complexity and scale of these models have given rise to a multifaceted supply chain that presents unique challenges across infrastructure, foundation models, and downstream applications. This paper provides the first comprehensive research agenda of the LLM supply chain, offering a structured approach to identify critical challenges and opportunities through the dual lenses of software engineering (SE) and security & privacy (S&P). We begin by establishing a clear definition of the LLM supply chain, encompassing its components and dependencies. We then analyze each layer of the supply chain, presenting a vision for robust and secure LLM development, reviewing the current state of practices and technologies, and identifying key challenges and research opportunities. This work aims to bridge the existing research gap in systematically understanding the multifaceted issues within the LLM supply chain, offering valuable insights to guide future efforts in this rapidly evolving domain.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

CodeScore: Evaluating Code Generation by Learning Code Execution

Cited by 2 publications

References 13 publications

ClarifyGPT: A Framework for Enhancing LLM-Based Code Generation via Requirements Clarification

ClarifyGPT: A Framework for Enhancing LLM-Based Code Generation via Requirements Clarification

Large Language Model Supply Chain: A Research Agenda

Contact Info

Product

Resources

About