The object of the study is code in the Python programming language, analyzed by machine learning methods to identify clones. This work is devoted to the study of machine learning methods and implementation of the decision tree machine learning model in the problem of finding clones in the program code. The paper also analyzes existing machine learning approaches for detecting duplicates in program code. During the comparison, the advantages and disadvantages of each algorithm were determined, and the results were summarized in the corresponding comparison tables. As a result of the analysis, it was determined that the method based on the decision tree, which gives the best result in the task of finding clones in the program code, is the most optimal both from the point of view of accuracy and from the point of view of implementation. The result of the work is a created model that, with an accuracy of more than 99 %, classifies cloned and non-cloned codes on an automatically generated dataset in a minimal amount of time. This system has several open questions for future research, the list of which is presented in this work. The proposed model has the following ways of further development: – recognition of clones rewritten from one programming language to another; – detection of vulnerabilities in the code; – improvement of model performance by creating more universal datasets. The perspective of the work lies in training a decision tree model for accurate and fast detection of code clones, which can potentially be widely used for plagiarism detection in both educational institutions and IT companies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.