Proceedings of the Fourth Workshop on Analytics for Noisy Unstructured Text Data 2010
DOI: 10.1145/1871840.1871844
|View full text |Cite
|
Sign up to set email alerts
|

A platform for storing, visualizing, and interpreting collections of noisy documents

Abstract: The goal of document image analysis is to produce interpretations that match those of a fluent and knowledgeable human when viewing the same input. Because computer vision techniques are not perfect, the text that results when processing scanned pages is frequently noisy. Building on previous work, we propose a new paradigm for handling the inevitable incomplete, partial, erroneous, or slightly orthogonal interpretations that commonly arise in document datasets. Starting from the observation that interpretatio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
11
0

Year Published

2011
2011
2017
2017

Publication Types

Select...
4
1

Relationship

4
1

Authors

Journals

citations
Cited by 8 publications
(11 citation statements)
references
References 17 publications
0
11
0
Order By: Relevance
“…In this paper, we have presented a vision for the future of experimental research in document analysis and described how our current DAE platform [8,9] can exploit collective intelligence to instill new practices in the field. This forms a significant first step toward a crowd-sourced document resource platform that can contribute in many ways to more reproducible and sustainable machine perception research.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…In this paper, we have presented a vision for the future of experimental research in document analysis and described how our current DAE platform [8,9] can exploit collective intelligence to instill new practices in the field. This forms a significant first step toward a crowd-sourced document resource platform that can contribute in many ways to more reproducible and sustainable machine perception research.…”
Section: Resultsmentioning
confidence: 99%
“…We also refer to a proof-of-concept prototype platform for Document Analysis and Exploitation (DAE -not to be confused with DARE), accessible at http://dae.cse.lehigh.edu, which is capable of storing data, meta-data and interpretations, interaction software, and complete provenance as more fully described elsewhere [8,9]. DAE is an important step in the direction of DARE, but still short of the grand vision described earlier.…”
Section: Script and Screenplay For The Scenariomentioning
confidence: 99%
See 1 more Smart Citation
“…As already described in [4], the DAE platform stores more than just basic collections of documents. It supports a data model that, from a higher level viewpoint, incorporates concepts and relations summarized by the following set of axioms:…”
Section: A General Overview Of Data Modelmentioning
confidence: 99%
“…Previous accounts of the features and the potential uses of the platform [4], [5], [7] focused on its architecture, implementation choices, and its potential to impact experimental research, especially with respect to reproducibility and accountability.…”
Section: Introduction and Contextmentioning
confidence: 99%