Ayoub Ali M. Saeed scite author profile

According to the scientific institutes, Plagiarism is defined as claiming someone else's ideas or efforts as one's own without citing the sources. Systems of plagiarism detection typically use a text similarity algorithm in a text document to look for common sentences between source and suspicious documents, either by directly matching the sentences or by embedding the sentences into a vector using TFIDF-like or other methods and then calculating the distance or the similarity between the source and suspect sentence vectors. The cosine similarity method is one of the methods for determining that distance. To cluster the documents and choose only related documents for detection, an unsupervised Machine learning technique such as K-means could be utilized. In this paper, a plagiarism detecting application was created and tested on many text document types, including doc, Docx, and pdf of research papers that were collected from the web to build the source corpus. To calculate the level of similarity between the suspicious article and the corpus of source articles, the TFIDF text encoding approach is used with NLP, K-means clustering, and cosine similarity algorithms. The proposed application was carried out with five different documents and resulted in different ratios of plagiarism, the first document has a 0.27 ratio, the second document has a 0.15 ratio, the third document has 0.19 ratio while document 4 has a 0.42 ratio, and finally, document 5 has 0.37 ratio of plagiarism. The generated detailed plagiarism ratio report presents the percentage of plagiarism in the suspicious article document. Depending on the threshold value, the application will decide if the suspicious document is acceptable or not.

show abstract

Using Retrieved Sources for Semantic and Lexical Plagiarism Detection

Saeed

Taqa

2023

Iraqi Journal of Science

View full text Add to dashboard Cite

Plagiarism is described as using someone else's ideas or work without their permission. Using lexical and semantic text similarity notions, this paper presents a plagiarism detection system for examining suspicious texts against available sources on the Web. The user can upload suspicious files in pdf or docx formats. The system will search three popular search engines for the source text (Google, Bing, and Yahoo) and try to identify the top five results for each search engine on the first retrieved page. The corpus is made up of the downloaded files and scraped web page text of the search engines' results. The corpus text and suspicious documents will then be encoded as vectors. For lexical plagiarism detection, the system will leverage Jaccard similarity and Term Frequency-Inverse Document Frequency (TFIDF) techniques, while for semantic plagiarism detection, Doc2Vec and Sentence Bidirectional Encoder Representations from Transformers (SBERT) intelligent text representation models will be used. Following that, the system compares the suspicious text to the corpus text. Finally, a generated plagiarism report will show the total plagiarism ratio, the plagiarism ratio from each source, and other details.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Ayoub Ali M. Saeed

An Intelligent Approach for Semantic Plagiarism Detection in Scientific Papers

Textual Plagiarism Detection Using Embedding Models and Siamese LSTM

A proposed approach for plagiarism detection in Article documents

Using Retrieved Sources for Semantic and Lexical Plagiarism Detection

Contact Info

Product

Resources

About