Proceedings of the 20th ACM International Conference on Information and Knowledge Management 2011
DOI: 10.1145/2063576.2063887
|View full text |Cite
|
Sign up to set email alerts
|

CoDet

Abstract: We study a generalized version of the near-duplicate detection problem which concerns whether a document is a subset of another document. In text-based applications, document containment can be observed in exact-duplicates, near-duplicates, or containments, where the first two are special cases of the third. We introduce a novel method, called CoDet, which focuses particularly on this problem, and compare its performance with four well-known near-duplicate detection methods (DSC, full fingerprinting, I-Match, … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Year Published

2011
2011
2014
2014

Publication Types

Select...
1
1

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
references
References 19 publications
0
0
0
Order By: Relevance