Developers often perform repetitive code editing activities (up to 70%) for various reasons (
e.g.,
code refactoring) during software development. Many deep learning (DL) models have been proposed to automate code editing by learning from the code editing history. Among DL-based models, pre-trained code editing models have achieved the state-of-the-art (SOTA) results. Pre-trained models are first pre-trained with pre-training tasks and fine-tuned with the code editing task. Existing pre-training tasks mainly are code infilling tasks (
e.g.,
masked language modeling), which are derived from the natural language processing field and are not designed for automatic code editing.
In this paper, we propose a novel pre-training task specialized in code editing and present an effective pre-trained code editing model named
CodeEditor
. Compared to previous code infilling tasks, our pre-training task further improves the performance and generalization ability of code editing models. Specifically, we collect lots of real-world code snippets as the ground truth and use a powerful generator to rewrite them into mutated versions. Then, we pre-train our
CodeEditor
to edit mutated versions into the corresponding ground truth, to learn edit patterns. We conduct experiments on four code editing datasets and evaluate the pre-trained
CodeEditor
in three settings (
i.e.,
fine-tuning, few-shot, and zero-shot). (1) In the fine-tuning setting, we train the pre-trained
CodeEditor
with four datasets and evaluate it on the test data.
CodeEditor
outperforms the SOTA baselines by 15%, 25.5%, and 9.4% and 26.6% on four datasets. (2) In the few-shot setting, we train the pre-trained
CodeEditor
with limited data and evaluate it on the test data.
CodeEditor
substantially performs better than all baselines, even outperforming baselines that are fine-tuned with all data. (3) In the zero-shot setting, we evaluate the pre-trained
CodeEditor
on the test data without training.
CodeEditor
correctly edits 1,113 programs while the SOTA baselines can not work. The results show that the superiority of our pre-training task and the pre-trained
CodeEditor
is more effective in automatic code editing.