2022
DOI: 10.20944/preprints202202.0058.v2
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Are Deep Models Robust against Real Distortions? A Case Study on Document Image Classification

Abstract: Deep neural networks have been extensively researched in the field of document image classification to improve classification performance and have shown excellent results. However, there is little research in this area that addresses the question of how well these models would perform in a real-world environment, where the data the models are confronted with often exhibits various types of noise or distortion. In this work, we present two separate benchmark datasets, namely RVL-CDIP-D and Tobacco3482-D, to eva… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
12
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
1

Relationship

3
1

Authors

Journals

citations
Cited by 4 publications
(12 citation statements)
references
References 24 publications
0
12
0
Order By: Relevance
“…[35] recently introduced DocXClassifier, a state-of-the-art transformer-inspired CNN that not only attains the highest performance in image-based classification but also possesses the property of being inherently explainable. Recent works [48,49] have also explored the use of Vision Transformers (ViTs) [50] for the document image classification task but have found it challenging to surpass CNNs using basic training approaches, even on sufficiently large datasets. However, a recent study [51] has shown that extensive pretraining enables ViTs to achieve performance levels comparable to those of CNNs but at the cost of additional training and increased complexity.…”
Section: Document Image Classificationmentioning
confidence: 99%
“…[35] recently introduced DocXClassifier, a state-of-the-art transformer-inspired CNN that not only attains the highest performance in image-based classification but also possesses the property of being inherently explainable. Recent works [48,49] have also explored the use of Vision Transformers (ViTs) [50] for the document image classification task but have found it challenging to surpass CNNs using basic training approaches, even on sufficiently large datasets. However, a recent study [51] has shown that extensive pretraining enables ViTs to achieve performance levels comparable to those of CNNs but at the cost of additional training and increased complexity.…”
Section: Document Image Classificationmentioning
confidence: 99%
“…Subsequent research has extensively used these datasets to evaluate and improve the robustness of deep learning models using various strategies [30,34]. Taking inspiration from this, Saifullah et al (2022) [11] recently introduced two robustness benchmark datasets, namely RVL-CDIP-D and Tobacco3482-D, designed for the document domain. These datasets were generated by applying 21 different types of novel distortions to generate out-of-distribution counterparts for the RVL-CDIP [17] and Tobacco3482 document datasets, respectively.…”
Section: Model Robustnessmentioning
confidence: 99%
“…In this section, we present a quantitative evaluation of the robustness of our proposed DocXClassifier models using the two benchmark datasets, RVL-CDIP-D and Tobacco3482-D [11], as discussed in Sec. 4.1.…”
Section: Evaluation Of Model Robustnessmentioning
confidence: 99%
See 1 more Smart Citation
“…As large volumes of documents are produced on a daily basis, there is an urgent need today to automate the processing of these documents to facilitate tasks such as search, retrieval, and information extraction. However, automated processing of documents can be particularly challenging for a number of reasons, including high levels of data complexity [1], large inter-class similarity and intra-class variance [2], and corruption of scanned document data with various types of distortions [3].…”
Section: Introductionmentioning
confidence: 99%