2020 IEEE Winter Conference on Applications of Computer Vision (WACV) 2020
DOI: 10.1109/wacv45572.2020.9093376
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Modal Association based Grouping for Form Structure Extraction

Abstract: Document structure extraction has been a widely researched area for decades. Recent work in this direction has been deep learning-based, mostly focusing on extracting structure using fully convolution NN through semantic segmentation. In this work, we present a novel multi-modal approach for form structure extraction. Given simple elements such as textruns and widgets, we extract higher-order structures such as TextBlocks, Text Fields, Choice Fields, and Choice Groups, which are essential for information colle… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 10 publications
(5 citation statements)
references
References 33 publications
0
5
0
Order By: Relevance
“…Sarkar et al [16] predicts all levels of the document hierarchy in parallel, making it quite efficient. Aggarwal et al [2] offers an approach that is architecturally like a languagebased approach but uses contextual pooling of CNN features like [5]. They determine a context window by identifying a neighborhood of form elements and use a CNN to extract image features from this context window.…”
Section: Prior Workmentioning
confidence: 99%
“…Sarkar et al [16] predicts all levels of the document hierarchy in parallel, making it quite efficient. Aggarwal et al [2] offers an approach that is architecturally like a languagebased approach but uses contextual pooling of CNN features like [5]. They determine a context window by identifying a neighborhood of form elements and use a CNN to extract image features from this context window.…”
Section: Prior Workmentioning
confidence: 99%
“…NLP-based methods work on low-level elements (e.g., tokens) and model layout analysis as a sequence labeling task. MMPAN [1] is presented to recognize form structures. DocBank [20] is proposed as a large scale dataset of multimodal layout analysis and several NLP baselines have been released.…”
Section: Related Workmentioning
confidence: 99%
“…Some regions (e.g.,Figure , Table) can be easily identified by visual features, while semantic features are important for separating visually similar regions (e.g.,Abstract and Paragraph). Therefore, some recent efforts try to combine both modalities [1,20,39,3]. Here we summarize them into two categories.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…The above mentioned methods are CVbased,considering layout analysis as detection or segmentation tasks. There are also some NLP-based methods [10,40],viewing layout analysis as a sequence-labeling task.These methods usually obtain text information through PDF parsing or OCR recognition.Thetextinformationprovides auxiliary NLP modality enhancement when mixed with CV-based methods, while for CV-based unimodal, the performance depends heavily on optimized visual feature representation.…”
Section: Introductionmentioning
confidence: 99%