2019 International Conference on Document Analysis and Recognition Workshops (ICDARW) 2019
DOI: 10.1109/icdarw.2019.10029
|View full text |Cite
|
Sign up to set email alerts
|

FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents

Abstract: In this paper, we present a new dataset for Form Understanding in Noisy Scanned Documents (FUNSD). Form Understanding (FoUn) aims at extracting and structuring the textual content of forms. The dataset comprises 200 fully annotated real scanned forms. The documents are noisy and exhibit large variabilities in their representation making FoUn a challenging task. The proposed dataset can be used for various tasks including text detection, optical character recognition (OCR), spatial layout analysis and entity la… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
178
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 210 publications
(178 citation statements)
references
References 16 publications
0
178
0
Order By: Relevance
“…We validate our method, DocStruct, on two benchmarks, MedForm and FUNSD (Jaume et al, 2019). The first one is built by us and composed of medical examination reports, and the second is composed of various real, fully annotated, scanned forms.…”
Section: Check Items Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…We validate our method, DocStruct, on two benchmarks, MedForm and FUNSD (Jaume et al, 2019). The first one is built by us and composed of medical examination reports, and the second is composed of various real, fully annotated, scanned forms.…”
Section: Check Items Resultsmentioning
confidence: 99%
“…• FUNSD-base: The baseline is offered in Jaume et al (2019), which uses semantic and layout information.…”
Section: Ablation Study and Baseline Comparisonmentioning
confidence: 99%
“…We evaluate how reading order impacts overall performance of graph-based information extraction from form-like documents. We adopt two form understanding tasks as Jaume et al (2019), including word labeling and word grouping. Word labeling is the task of assigning each word a label from a set of predefined entity categories, realized by node classification.…”
Section: Methodsmentioning
confidence: 99%
“…FUNSD. FUNSD (Jaume et al, 2019) is a public dataset for form understanding in noisy scanned documents, containing a collection of research, marketing, and advertising documents that vary widely in their structure and appearance. The dataset consists of 199 annotated forms with 9,707 entities and 31,485 word-level annotations for 4 entity types: header, question, answer, and other.…”
Section: Datasetsmentioning
confidence: 99%
“…The study [7] presented a Form Understanding (FUNSD) dataset containing 199 completely annotated noisy scanned forms in JSON format. FUNSD consists of forms with diverse fields like Marketing, Advertisement, Science, and few others, which are used for text detection, Optical Character Recognition (OCR), and document layout understanding tasks.…”
Section: A Standard Datasetsmentioning
confidence: 99%