2002
DOI: 10.1007/3-540-46043-8_4
|View full text |Cite
|
Sign up to set email alerts
|

Comparison of Overlap Detection Techniques

Abstract: Abstract. Easy access to the World Wide Web has raised concerns about copyright issues and plagiarism. It is easy to copy someone else's work and submit it as someone's own. This problem has been targeted by many systems, which use very similar approaches. These approaches are compared in this paper and suggestions are made when different strategies are more applicable than others. Some alternative approaches are proposed that perform better than previously presented methods. These previous methods share two c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
22
0

Year Published

2005
2005
2017
2017

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 25 publications
(22 citation statements)
references
References 5 publications
0
22
0
Order By: Relevance
“…Different methods and approaches have been used to tackle this issue of similarities between documents using semantically, syntactical or semantic features. Semantic similarity received less attention for the inherent difficulties of representing semantics and the limitations on assessment coverage of user studies [54] [72]. Commonly used methods for determination of similarity include fingerprinting [21], Information Retrieval [28] and other hybrid techniques [24] [44].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Different methods and approaches have been used to tackle this issue of similarities between documents using semantically, syntactical or semantic features. Semantic similarity received less attention for the inherent difficulties of representing semantics and the limitations on assessment coverage of user studies [54] [72]. Commonly used methods for determination of similarity include fingerprinting [21], Information Retrieval [28] and other hybrid techniques [24] [44].…”
Section: Related Workmentioning
confidence: 99%
“…The combined use of syntactical POS tagging and text processing methods for the purpose of text similarity calculations and its applications was used in this recent work [72]- [77]. It was based on the intuition that similar (exact) documents would have similar (exact) syntactical structures.…”
Section: Related Workmentioning
confidence: 99%
“…Other mostly non-semantically oriented techniques have received more attention. These include fingerprinting [29], IR [6] and many hybrid techniques [5,13,14]. In information retrieval models, more emphasis is put on representing documents by their words and word frequencies.…”
Section: Related Workmentioning
confidence: 99%
“…It spans many fields of research including, among many other applications, copy/ near-copy detection [1,2], plagiarism [3][4][5], Information Retrieval (IR) [6,7] and computational biology [8][9][10]12]. Many such applications employee a combination of techniques and apply to multidisciplinary fields [13,14,15]. This paper reports on a work that investigated how related Arabic documents can be treated as modified versions of one another using edit operations [16][17][18].…”
Section: Introductionmentioning
confidence: 99%
“…These include plagiarism (Braumoeller, 2001;Monostori, 2002;Cook, 2002;Hoad, 2003;Gilbert, 2003;Pecorari, 2003;Chen, 2004;Bao, 2004), duplicate/ redundant publication (Doherty, 1996;Jefferson, 1998;Schein, 2001;Bailey, 2002;Von Elm, 2004;Mojon-Azzi, 2004;Gwilym, 2004), text/ document clustering (Maderlechner, 1997;Atlam, 2003;Dobrynin, 2004;Shin, 2004;Bansal, 2004), and information retrieval (Salton, 1991;Hui, 2004;Leuski, 2004;Muresan, 2004;Chang, 2004). These studies have shown that, in general, identifying similar documents through concept matching is quite difficult.…”
Section: Introductionmentioning
confidence: 99%