Context: Code readability plays a critical role in software maintenance and evolvement, where a metric for classifying code readability levels is both applicable and desired. However, most prior research has treated code readability classification as a binary classification task due to the lack of labeled data. Objective: To support the training of multi-class code readability classification models, we propose an enhanced data augmentation approach. Method: The approach includes the use of domainspecific data transformation and GAN-based data augmentation. By virtue of this augmentation approach, we could generate sufficient readability data and well train a multi-class code readability model. Result: A series of experiments are conducted to evaluate our augmentation approach. The experimental results show that a state-of-the-art multi-class code readability classification accuracy of 68.0% is reached with a significant improvement of 6.3% compared to only using the original data. Conclusion: As an innovative work of proposing multi-class code readability classification and an enhanced code readability data augmentation approach, our method is proved to be effective.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.