Proceedings of the 14th ACM International Conference on Information and Knowledge Management 2005
DOI: 10.1145/1099554.1099733
|View full text |Cite
|
Sign up to set email alerts
|

Redundant documents and search effectiveness

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
29
0

Year Published

2006
2006
2020
2020

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 38 publications
(30 citation statements)
references
References 15 publications
1
29
0
Order By: Relevance
“…Similar work, in a mono-lingual environment, involves the identification of redundant [4] and co-derivative [3] documents, using fingerprinting techniques. Fingerprints are compact representations of text chunks.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Similar work, in a mono-lingual environment, involves the identification of redundant [4] and co-derivative [3] documents, using fingerprinting techniques. Fingerprints are compact representations of text chunks.…”
Section: Related Workmentioning
confidence: 99%
“…In the context of web search, data redundancy in the search results has already been shown to be an issue [4]. For example, even if a document is considered to be relevant to an information need, when shown after a number of redundant documents, it does not provide the user any additional information.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…(3) We suggest a new technique called bridging for estimating the origin of all selected shingles in a document even though only information about a very small number of shingles in the document is available. (4) We perform an extensive analysis of different algorithms on two real datasets and show that (1), (2) and (3) together provide the best solution for our problem.…”
Section: Introductionmentioning
confidence: 99%
“…(3) Semantic duplication, where pages contain (almost) the same content, but different words. Most attention in the past has been given to finding near-duplicate pages [4,6,10,11,12,16,17]. Recently, attention has shifted towards detecting partial replication [7,15,14], but none of the prior work focuses on the origin detection problem.…”
Section: Introductionmentioning
confidence: 99%