2014
DOI: 10.1142/s0219649214500282
|View full text |Cite
|
Sign up to set email alerts
|

Source Code Plagiarism Detection Using Biological String Similarity Algorithms

Abstract: Source code plagiarism is easy to commit but di±cult to catch. Many approaches have been proposed in the literature to automate its detection; however there is little consensus on what works best. In this paper, we propose two new measures for determining the accuracy of a given technique and describe an approach to convert code¯les into strings which can then be compared for similarity in order to detect plagiarism. We then compare several string comparison techniques, heavily utilised in the area of biologic… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2015
2015
2024
2024

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 15 publications
0
4
0
Order By: Relevance
“…Since it is not provided with labels, the learning system must discover how the information is organized on its own. Clustering algorithms are also a part of this group of algorithms [3].…”
Section: Un-supervised Machine Learningmentioning
confidence: 99%
See 1 more Smart Citation
“…Since it is not provided with labels, the learning system must discover how the information is organized on its own. Clustering algorithms are also a part of this group of algorithms [3].…”
Section: Un-supervised Machine Learningmentioning
confidence: 99%
“…Plagiarism, on the other hand, is defined as the copying of written materials and computer code. Source code plagiarism is defined as attempting to pass off another person's source code as one's own while omitting to recognize which precise sections were copied from which author [3]. Plagiarism in source code is common in academic programming coursework [4].…”
Section: Introductionmentioning
confidence: 99%
“…Such methods, as stated in [21], involve more complex as well as robust approaches. Normally, source code files are treated as text files, hence, common methods such as the traditional Bagof-Words, character n-grams [12,22], and longest common sub-sequence [2,15] are among the most popular techniques.…”
Section: Related Workmentioning
confidence: 99%
“…This practice is defined as source code plagiarism by Cosma and Joy (2008). It involves obtaining source code either with or without the permission of the original author, and submitting the copied code with no, minor, or even major modifications aimed at concealing plagiarism (Rahal & Wielga, 2014).…”
Section: Introductionmentioning
confidence: 99%