Fengji Zhang scite author profile

Fengji Zhang

4Publications

13Citation Statements Received

91Citation Statements Given

How they've been cited

How they cite others

124

Affiliations

Anhui University, Nanjing Normal University

Publications

Order By: Most citations

CodeT: Code Generation with Generated Tests

Chen¹,

Zhang²,

Nguyen³

et al. 2022

Preprint

View full text Add to dashboard Cite

Given a programming problem, pre-trained language models such as Codex have demonstrated the ability to generate multiple different code solutions via sampling. However, selecting a correct or best solution from those samples still remains a challenge. While an easy way to verify the correctness of a code solution is through executing test cases, producing high-quality test cases is prohibitively expensive. In this paper, we explore the use of pre-trained language models to automatically generate test cases, calling our method CODET: CODE generation with generated Tests. CODET executes the code solutions using the generated test cases, and then chooses the best solution based on a dual execution agreement with both the generated test cases and other generated solutions. We evaluate CODET on five different pre-trained models with both HumanEval and MBPP benchmarks. Extensive experimental results demonstrate CODET can achieve significant, consistent, and surprising improvements over previous methods. For example, CODET improves the pass@1 on HumanEval to 65.8%, an increase of absolute 18.8% on the code-davinci-002 model, and an absolute 20+% improvement over previous state-of-the-art results. * The first three authors contributed equally. 1 https://github.com/features/copilot 2 Results on the HumanEval benchmark with code-cushman-001. More results can be found in Section 4.1.

show abstract

Biophysical climate impact of forests with different age classes in mid- and high-latitude North America

Zhang

Wang

et al. 2021

Forest Ecology and Management

View full text Add to dashboard Cite

Improving Stack Overflow question title generation with copying enhanced CodeBERT model and bi-modal information

Zhang¹,

Keung²,

Yu³

et al. 2022

Information and Software Technology

View full text Add to dashboard Cite

Improving Stack Overflow question title generation with copying enhanced CodeBERT model and bi-modal information

Zhang¹,

Keung²,

Li³

et al. 2021

Preprint

View full text Add to dashboard Cite

Context: Stack Overflow is very helpful for software developers who are seeking answers to programming problems. Previous studies have shown that a growing number of questions are of low-quality and thus obtain less attention from potential answerers. Gao et al. proposed a LSTM-based model (i.e., BiLSTM-CC) to automatically generate question titles from the code snippets to improve the question quality. However, only using the code snippets in question body cannot provide sufficient information for title generation, and LSTMs cannot capture the long-range dependencies between tokens. Objective: We propose CCBERT, a deep learning based novel model to enhance the performance of question title generation by making full use of the bi-modal information of the entire question body. Methods: CCBERT follows the encoder-decoder paradigm, and uses CodeBERT to encode the question body into hidden representations, a stacked Transformer decoder to generate predicted tokens, and an additional copy attention layer to refine the output distribution. Both the encoder and decoder perform the multi-head self-attention operation to better capture the long-range dependencies. We build a dataset containing more than 120,000 high-quality questions filtered from the data officially published by Stack Overflow to verify the effectiveness of the CCBERT model. Results: CCBERT achieves a better performance on the dataset, and especially outperforms BiLSTM-CC and a multi-purpose pre-trained model (BART) by 14% and 4% on average, respectively. Experiments on both code-only and low-resource datasets also show the superiority of CCBERT with less performance degradation, which are 40% and 13.5% for BiLSTM-CC, while 24% and 5% for CCBERT, respectively. Conclusion: CCBERT is capable of automatically capturing the bi-modal semantic information from the entire question body and parsing the long-range dependencies to achieve better performance. Therefore, CCBERT is an effective approach for Stack Overflow question title generation.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Fengji Zhang

CodeT: Code Generation with Generated Tests

Biophysical climate impact of forests with different age classes in mid- and high-latitude North America

Improving Stack Overflow question title generation with copying enhanced CodeBERT model and bi-modal information

Improving Stack Overflow question title generation with copying enhanced CodeBERT model and bi-modal information

Contact Info

Product

Resources

About