2020
DOI: 10.3390/sym12122044
|View full text |Cite
|
Sign up to set email alerts
|

Source Code Authorship Identification Using Deep Neural Networks

Abstract: Many open-source projects are developed by the community and have a common basis. The more source code is open, the more the project is open to contributors. The possibility of accidental or deliberate use of someone else’s source code as a closed functionality in another project (even a commercial) is not excluded. This situation could create copyright disputes. Adding a plagiarism check to the project lifecycle during software engineering solves this problem. However, not all code samples for comparing can b… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
16
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 18 publications
(18 citation statements)
references
References 22 publications
2
16
0
Order By: Relevance
“…There are many studies devoted to establishing authorship of a natural text and source code, the gender and age of the author, determining the sentiment of the text for forensic and social science purposes [5,6]. The early paper [1] provides a detailed review of 2015-2020 studies aimed at determining the author of a text, including approaches based on deep neural networks (NN), classical machine learning (ML) methods, and aspect analysis.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…There are many studies devoted to establishing authorship of a natural text and source code, the gender and age of the author, determining the sentiment of the text for forensic and social science purposes [5,6]. The early paper [1] provides a detailed review of 2015-2020 studies aimed at determining the author of a text, including approaches based on deep neural networks (NN), classical machine learning (ML) methods, and aspect analysis.…”
Section: Related Workmentioning
confidence: 99%
“…The hybrid model can work better than separated networks. This article considers combinations of CNN + CNN, LSTM + CNN, and CNN + LSTM, which showed excellent results in the related problem of determining the author of the software source code [5]. It is worth noting that the popular modern architectures CNN with attention and Transformers in the previous study [1] proved to be less accurate and more time-consuming, so they were not considered in this study.…”
Section: Hybrid Neural Network Modelsmentioning
confidence: 99%
See 1 more Smart Citation
“…For example, "turn your homework" is a 3-g because it consists of three words. In this case, the probability of "homework" that will appear after "turn your" can be found [27]. The probability is calculated as follows:…”
Section: N-gram Modelmentioning
confidence: 99%
“…The relevance of research on this topic is associated with the auxiliary function for solving the problems of text mining [1][2][3][4][5][6][7], particularly its attribution [8,9]. The sentiment of the text, as well as the gender and age of the author, are the most informative features used in attribution.…”
Section: Introductionmentioning
confidence: 99%