Proceedings of the Fourth International Conference on Document Analysis and Recognition
DOI: 10.1109/icdar.1997.620657
|View full text |Cite
|
Sign up to set email alerts
|

UW-ISL document image analysis toolbox: an experimental environment

Abstract: A document image analysis toolbox, including a collection of data structures and algorithms to suppbrt a variety of applications, is described in this paper. An experimental environment is built to allow developers to develop, test and optimize their algorithms and systems. Appropriate and quantitative performance metrics for each kind of information a document analysis technique infers have been developed, The performance of each algorithm has been evaluatd based o n these metrics and the UW-III document imag… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
17
0

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 14 publications
(17 citation statements)
references
References 4 publications
0
17
0
Order By: Relevance
“…It is important to note, however, that shared datasets are only a part of what is needed for performance evaluation, and since research in document analysis is often task-driven, specific interpretations of a dataset may exist. So whether the problem is invoice routing, building the semantic desktop, digital libraries, global intelligence, or document authentication, to name a few, the result tends to be applicationspecific, resulting in software solutions that integrate a complete pipeline of cascading methods and algorithms [14,23]. This most certainly does not affect the intrinsic quality of the underlying research, but it does tend to generate isolated clusters of very focused problem definitions and experimental requirements.…”
Section: Introductionmentioning
confidence: 98%
See 1 more Smart Citation
“…It is important to note, however, that shared datasets are only a part of what is needed for performance evaluation, and since research in document analysis is often task-driven, specific interpretations of a dataset may exist. So whether the problem is invoice routing, building the semantic desktop, digital libraries, global intelligence, or document authentication, to name a few, the result tends to be applicationspecific, resulting in software solutions that integrate a complete pipeline of cascading methods and algorithms [14,23]. This most certainly does not affect the intrinsic quality of the underlying research, but it does tend to generate isolated clusters of very focused problem definitions and experimental requirements.…”
Section: Introductionmentioning
confidence: 98%
“…For instance, there have been numerous attempts to produce common datasets for problems which arise in document analysis [14,25,24]. It is important to note, however, that shared datasets are only a part of what is needed for performance evaluation, and since research in document analysis is often task-driven, specific interpretations of a dataset may exist.…”
Section: Introductionmentioning
confidence: 99%
“…One of the bottlenecks in training fully convolutional networks is the need for pixel-wise ground truth data. Previous document understanding datasets [32,45,51,7] are limited by both their small size and the lack of fine-grained semantic labels such as section headings, lists, or figure and table captions. To address these issues, we propose an efficient synthetic document generation process and use it to generate large-scale pretraining data for our network.…”
Section: Introductionmentioning
confidence: 99%
“…Two factors have been instrumental in advancing the field of black-and-white document analysis. Firstly, the existence of public domain data sets like the UW [10] and MTDB [17], freeing researchers from the labor intensive task of creating datasets to work on. Secondly, the availability of standard evaluation tools for OCR and page segmentation [11], [24], [16] allowing knowledge exchange between different researchers.…”
Section: Introductionmentioning
confidence: 99%