Source Code Plagiarism Detection Using Biological String Similarity Algorithms

Rahal, Imad; Wielga, Colin

doi:10.1142/s0219649214500282

Cited by 6 publications

(4 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Since it is not provided with labels, the learning system must discover how the information is organized on its own. Clustering algorithms are also a part of this group of algorithms [3].…”

Section: Un-supervised Machine Learningmentioning

confidence: 99%

“…Plagiarism, on the other hand, is defined as the copying of written materials and computer code. Source code plagiarism is defined as attempting to pass off another person's source code as one's own while omitting to recognize which precise sections were copied from which author [3]. Plagiarism in source code is common in academic programming coursework [4].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Detecting Source Code Plagiarism in Student Assignment Submissions Using Clustering Techniques

Raddam Sami Mehsen,

Majharoddin M. Kazi,

Hiren Joshi

2024

View full text Add to dashboard Cite

In pragmatic courses, graduate students are required to submit programming assignments, which have been susceptible to various forms of plagiarism. Detecting counterfeited code in an academic setting is of paramount importance, given the prevalence of publications and papers. Plagiarism, defined as the unauthorized replication of written work without proper acknowledgment, has become a critical concern with the advent of information and communication technology (ICT) and the widespread availability of scholarly publications online. However, the extensive use of freeware text editors has posed challenges in detecting source code plagiarism. Numerous studies have investigated algorithms for revealing different types of plagiarism and detecting source code plagiarism. In this research, we propose an innovative strategy that combines TF-IDF (Term Frequency-Inverse Document Frequency) modifications with K-means clustering, achieving a remarkable precision rate of 99.2%. Additionally, we explore the hierarchical clustering method, which estimates an even higher precision rate of 99.5% compared to previous techniques. To implement our approach, we utilize the Python programming language along with relevant libraries, providing a robust and efficient system for source code plagiarism detection in student assignment submissions.

show abstract

“…Since it is not provided with labels, the learning system must discover how the information is organized on its own. Clustering algorithms are also a part of this group of algorithms [3].…”

Section: Un-supervised Machine Learningmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Detecting Source Code Plagiarism in Student Assignment Submissions Using Clustering Techniques

Raddam Sami Mehsen,

Majharoddin M. Kazi,

Hiren Joshi

2024

View full text Add to dashboard Cite

show abstract

“…Such methods, as stated in [21], involve more complex as well as robust approaches. Normally, source code files are treated as text files, hence, common methods such as the traditional Bagof-Words, character n-grams [12,22], and longest common sub-sequence [2,15] are among the most popular techniques.…”

Section: Related Workmentioning

confidence: 99%

On the Detection of SOurce COde Re-use

Flores

Rosso

Moreno

et al. 2015

Proceedings of the Forum for Information Retrieval Evaluation on - FIRE '14

View full text Add to dashboard Cite

This paper summarizes the goals, organization and results of the first SOCO competitive evaluation campaign for systems that automatically detect the source code re-use phenomenon. The detection of source code re-use is an important research field for both software industry and academia fields. Accordingly, PAN@FIRE track, named SOurce COde Re-use (SOCO) focused on the detection of re-used source codes in C/C++ and Java programming languages. Participant systems were asked to annotate several source codes whether or not they represent cases of source code re-use. In total five teams submitted 17 runs. The training set consisted of annotations made by several experts, a feature which turns the SOCO 2014 collection in a useful data set for future evaluations and, at the same time, it establishes a standard evaluation framework for future research works on the posed shared task.

show abstract

“…This practice is defined as source code plagiarism by Cosma and Joy (2008). It involves obtaining source code either with or without the permission of the original author, and submitting the copied code with no, minor, or even major modifications aimed at concealing plagiarism (Rahal & Wielga, 2014).…”

Section: Introductionmentioning

confidence: 99%

Dolos: Language‐agnostic plagiarism detection in source code

Maertens

Petegem

Strijbol

et al. 2022

Computer Assisted Learning

View full text Add to dashboard Cite

Background Learning to code is increasingly embedded in secondary and higher education curricula, where solving programming exercises plays an important role in the learning process and in formative and summative assessment. Unfortunately, students admit that copying code from each other is a common practice and teachers indicate they rarely use plagiarism detection tools. Objectives We want to lower the barrier for teachers to detect plagiarism by introducing a new source code plagiarism detection tool (Dolos) that is powered by state‐of‐the art similarity detection algorithms, offers interactive visualizations, and uses generic parser models to support a broad range of programming languages. Methods Dolos is compared with state‐of‐the‐art plagiarism detection tools in a benchmark based on a standardized dataset. We describe our experience with integrating Dolos in a programming course with a strong focus on online learning and the impact of transitioning to remote assessment during the COVID‐19 pandemic. Results and Conclusions Dolos outperforms other plagiarism detection tools in detecting potential cases of plagiarism and is a valuable tool for preventing and detecting plagiarism in online learning environments. It is available under the permissive MIT open‐source license at https://dolos.ugent.be. Implications Dolos lowers barriers for teachers to discover, prove and prevent plagiarism in programming courses. This helps to enable a shift towards open and online learning and assessment environments, and opens up interesting avenues for more effective learning and better assessment.

show abstract

Source Code Plagiarism Detection Using Biological String Similarity Algorithms

Cited by 6 publications

References 15 publications

Detecting Source Code Plagiarism in Student Assignment Submissions Using Clustering Techniques

Detecting Source Code Plagiarism in Student Assignment Submissions Using Clustering Techniques

On the Detection of SOurce COde Re-use

Dolos: Language‐agnostic plagiarism detection in source code

Contact Info

Product

Resources

About