Source Code Plagiarism Detection Using Siamese BLSTM Network and Embedding Models

Manahi, Mohammed; Sulaiman, Suriani; Bakar, Normi Sham Awang Abu

doi:10.1007/978-981-16-8515-6_31

Cited by 3 publications

(2 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The detection process typically involves transforming the code into a highdimensional feature representation followed by measurement of code similarity. Aside from traditionally used features extracted based on structural or syntactic properties of programs (Ji et al, 2007;Lange and Mancoridis, 2007), NLP-based approaches such as n-grams (Ohmann and Rahal, 2015), topic modeling (Ullah et al, 2021), character and word embeddings (Manahi, 2021), and character-level language models (Katta, 2018) are increasingly being used for robust code representations. Similarly for downstream similarity modeling or classification, unsupervised (Acampora and Cosma, 2015) and supervised (Bandara and Wijayarathna, 2011;Manahi, 2021) machine learning and deep learning algorithms are popularly used.…”

Section: Performance Assessment and Monitoringmentioning

confidence: 99%

Proactive and reactive engagement of artificial intelligence methods for education: a review

Mallik

Gangopadhyay

2023

Front. Artif. Intell.

View full text Add to dashboard Cite

The education sector has benefited enormously through integrating digital technology driven tools and platforms. In recent years, artificial intelligence based methods are being considered as the next generation of technology that can enhance the experience of education for students, teachers, and administrative staff alike. The concurrent boom of necessary infrastructure, digitized data and general social awareness has propelled these efforts further. In this review article, we investigate how artificial intelligence, machine learning, and deep learning methods are being utilized to support the education process. We do this through the lens of a novel categorization approach. We consider the involvement of AI-driven methods in the education process in its entirety—from students admissions, course scheduling, and content generation in the proactive planning phase to knowledge delivery, performance assessment, and outcome prediction in the reactive execution phase. We outline and analyze the major research directions under proactive and reactive engagement of AI in education using a representative group of 195 original research articles published in the past two decades, i.e., 2003–2022. We discuss the paradigm shifts in the solution approaches proposed, particularly with respect to the choice of data and algorithms used over this time. We further discuss how the COVID-19 pandemic influenced this field of active development and the existing infrastructural challenges and ethical concerns pertaining to global adoption of artificial intelligence for education.

show abstract

Section: Performance Assessment and Monitoringmentioning

confidence: 99%

Proactive and reactive engagement of artificial intelligence methods for education: a review

Mallik

Gangopadhyay

2023

Front. Artif. Intell.

View full text Add to dashboard Cite

show abstract

“…Manahi et al used a combination of Siamese networks, Bidirectional Long Short-Term Memory (BLSTM), and character embeddings on a dataset including 16,800 introductory course C assignments (Manahi et al, 2022). Siamese networks are multiple similar neural networks with the same configurations and weights.…”

Section: Related Workmentioning

confidence: 99%

Source Code Plagiarism Detection with Pre-Trained Model Embeddings and Automated Machine Learning

Ebrahim,

Joy

2023

Proceedings of the Conference Recent Advances in Natural Language Processing - Large Language Models for Natural Language Proce

View full text Add to dashboard Cite

Source code plagiarism is a critical ethical issue in computer science education where students use someone else's work as their own. It can be treated as a binary classification problem where the output can be either 'yes' (plagiarism found) or 'no' (plagiarism not found).In this research, we have taken the open-source dataset 'SOCO', which contains two programming languages (PLs), namely Java and C/C++ (although our method could be applied to any PL). Source codes should be converted to vector representations that capture both the syntax and semantics of the text, known as contextual embeddings. These embeddings would be generated using source code pre-trained models (CodePTMs). The cosine similarity scores of three different CodePTMs were selected as features. The classifier selection and parameter tuning were conducted with the assistance of Automated Machine Learning (AutoML). The selected classifiers were tested, initially on Java, and the proposed approach produced average to high results compared to other published research, and surpassed the baseline (the JPlag plagiarism detection tool). For C/C++, the approach outperformed other research work and produced the highest ranking score.

show abstract

Textual Plagiarism Detection Using Embedding Models and Siamese LSTM

Saeed

Taqa

2022

2022 International Conference for Natural and Applied Sciences (ICNAS)

View full text Add to dashboard Cite

Source Code Plagiarism Detection Using Siamese BLSTM Network and Embedding Models

Cited by 3 publications

References 25 publications

Proactive and reactive engagement of artificial intelligence methods for education: a review

Proactive and reactive engagement of artificial intelligence methods for education: a review

Source Code Plagiarism Detection with Pre-Trained Model Embeddings and Automated Machine Learning

Textual Plagiarism Detection Using Embedding Models and Siamese LSTM

Contact Info

Product

Resources

About