Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2006
DOI: 10.1145/1148170.1148307
|View full text |Cite
|
Sign up to set email alerts
|

Building a test collection for complex document information processing

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
129
0

Year Published

2010
2010
2020
2020

Publication Types

Select...
4
4
2

Relationship

0
10

Authors

Journals

citations
Cited by 204 publications
(129 citation statements)
references
References 4 publications
0
129
0
Order By: Relevance
“…IIT CDIP was created at the Illinois Institute of Technology (Lewis et al 2006; ) and is based on documents released under the Master Settlement Agreement (MSA) between the Attorneys General of several U.S. states and seven U.S. tobacco companies and Evaluation of IR for E-discovery 367 institutes. 35 The University of California San Francisco (UCSF) Library, with support from the American Legacy Foundation, has created a permanent repository, the Legacy Tobacco Documents Library (LTDL), for tobacco documents (Schmidt et al 2002), of which IIT CDIP is a cleaned up snapshot generated in 2005 and 2006.…”
Section: The Iit Cdip Collectionmentioning
confidence: 99%
“…IIT CDIP was created at the Illinois Institute of Technology (Lewis et al 2006; ) and is based on documents released under the Master Settlement Agreement (MSA) between the Attorneys General of several U.S. states and seven U.S. tobacco companies and Evaluation of IR for E-discovery 367 institutes. 35 The University of California San Francisco (UCSF) Library, with support from the American Legacy Foundation, has created a permanent repository, the Legacy Tobacco Documents Library (LTDL), for tobacco documents (Schmidt et al 2002), of which IIT CDIP is a cleaned up snapshot generated in 2005 and 2006.…”
Section: The Iit Cdip Collectionmentioning
confidence: 99%
“…25 non-distorted images in this dataset are taken from two freely available datasets -University of Washington Dataset [5] and Tobacco Database [9]. For each document, multiple photos were taken from a fixed distance to capture the whole document, but the camera was focused at varying distance to generate a series of images with focal blur.…”
Section: Datasetmentioning
confidence: 99%
“…The collection used for the experiments is the Complex Document Information Processing (CDIP) test collection [6]. CDIP includes 7 million scanned documents and over 42 million pages, received from tobacco company lawsuits.…”
Section: Cdip Tobacco Datasetmentioning
confidence: 99%