Deep learning methods have found successful applications in fields like image classification and natural language processing. They have recently been applied to source code analysis too, due to the enormous amount of freely available source code (e.g., from open-source software repositories). In this work, we elaborate upon a state-of-the-art approach for source code representation, which uses information about its syntactic structure, and we extend it to represent source code changes (i.e., commits). We use this representation to tackle an industrial-relevant task: the classification of security-relevant commits. We leverage on transfer learning, a machine learning technique which reuses, or transfers, information learned from previous tasks (commonly called pretext tasks) to tackle a new target task. We assess the impact of using two different pretext tasks, for which abundant labeled data is available, to tackle the classification of security-relevant commits. Our results indicate that representations that exploit the structural information in code syntax outperform token-based representations. Furthermore, we show that pre-training on a small dataset (> 10 4 samples), but for a pretext task that is closely related to the target task, results in better performance metrics than pre-training on a loosely related pretext task with a very large dataset (> 10 6 samples).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.