2019 International Conference on Document Analysis and Recognition (ICDAR) 2019
DOI: 10.1109/icdar.2019.00021
|View full text |Cite
|
Sign up to set email alerts
|

Multimodal Document Image Classification

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
34
0
1

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
3
2

Relationship

1
9

Authors

Journals

citations
Cited by 46 publications
(35 citation statements)
references
References 21 publications
0
34
0
1
Order By: Relevance
“…Pondenkandath et al 23 explored four applications for document classifications including handwriting styles, layout, font, and authorship using a residual network. 65 Jain and Wigington 22 fused visual features extracted using the CNN-based deep learning network and noisy semantic information obtained using OCR to identify document categories. Khan et al 26 proposed a CNN-based approach to detect mismatching ink-color in hyperspectral document images for identifying forged documents.…”
Section: Image Classification Using Cnnmentioning
confidence: 99%
“…Pondenkandath et al 23 explored four applications for document classifications including handwriting styles, layout, font, and authorship using a residual network. 65 Jain and Wigington 22 fused visual features extracted using the CNN-based deep learning network and noisy semantic information obtained using OCR to identify document categories. Khan et al 26 proposed a CNN-based approach to detect mismatching ink-color in hyperspectral document images for identifying forged documents.…”
Section: Image Classification Using Cnnmentioning
confidence: 99%
“…Meanwhile, an average ensembling method was applied to concatenate the textual and visual stream in the proposed approach. Jain and Wigington [10] proposed another method for DLA based on the multimodal feature fusion combining a feature representation of the visual and text modalities.…”
Section: Related Workmentioning
confidence: 99%
“…It has been empirically established that the textual features have more prominence towards document image classification contrary to image classification [9], [10]. The image features improve the classification results in a multi-modal environment but cannot alone perform the job [11], [12]. However, the past researchers in the multi-modal context exerted themselves into various image networks keeping the text feature extractor almost unexplored even though the state-of-the-art in the Natural Language Processing (NLP) technology domain has seen a paradigm shift with many transformers-based models like Bidirectional Encoder Representations from Transformers (BERT).…”
Section: Introductionmentioning
confidence: 99%