dhSegment: A Generic Deep-Learning Approach for Document Segmentation

Oliveira, Sofia Ares; Séguin, Benoît; Kaplan, Frédé́ric

doi:10.1109/icfhr-2018.2018.00011

Cited by 149 publications

(168 citation statements)

References 18 publications

Supporting

Mentioning

160

Contrasting

Unclassified

Order By: Relevance

“…In the same idea, [12] also investigated such deep architectures to classify identity documents. [13] goes even further by trying to segment the full layout of a document image into paragraphs, titles, ornaments, images etc. These models focus on extracting strong visual features from the images to classify the documents based on their layout, geometry, colors and shape.…”

Section: Related Workmentioning

confidence: 99%

“…For industrial-grade applications dealing with user-generated content, such a data augmentation is necessary to alleviate overfitting and reduce the gap between train and actual data. Preprocessing page segmentation and layout analysis tools, such as dhSegment [13] can also bring significant improvements by renormalizing image orientation and cropping the document before sending it to the classifier. Moreover, as we have seen, the post-OCR word embeddings include lots of noisy or completely wrong words that generate OOV errors.…”

Section: Limitationsmentioning

confidence: 99%

See 1 more Smart Citation

Multimodal Deep Networks for Text and Image-Based Document Classification

Audebert¹,

Herold²,

Slimani³

et al. 2020

Communications in Computer and Information Science

View full text Add to dashboard Cite

Classification de documents, apprentissage multimodal, fusion de données. AbstractClassification of document images is a critical step for archival of old manuscripts, online subscription and administrative procedures. Computer vision and deep learning have been suggested as a first solution to classify documents based on their visual appearance. However, achieving the fine-grained classification that is required in real-world setting cannot be achieved by visual analysis alone. Often, the relevant information is in the actual text content of the document. We design a multimodal neural network that is able to learn from word embeddings, computed on text extracted by OCR, and from the image. We show that this approach boosts pure image accuracy by 3% on Tobacco3482 and RVL-CDIP augmented by our new QS-OCR text dataset 1 , even without clean text information.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Limitationsmentioning

confidence: 99%

Multimodal Deep Networks for Text and Image-Based Document Classification

Audebert¹,

Herold²,

Slimani³

et al. 2020

Communications in Computer and Information Science

View full text Add to dashboard Cite

show abstract

“…The learning problem is how to adjust the HMM parameters (a ij , b i (x), c jg , µ jg and Σ jg ), so that a given set of observations (called training set) is generated by the model with maximum likelihood. The Baum-Welch algorithm [20] (also known as Forward-Backward algorithm), is used to find these unknown parameters. It is an expectation-maximization (EM) algorithm.…”

Section: The Learning Problem and The Baum-welch Algorithmmentioning

confidence: 99%

“…Throughout the years several surveys have been performed [8,13,14,19,20,24] to compile this work and classify the underlying strategies used to tackle this task. I do not intend to provide a survey as complex and detailed as the aforementioned articles.…”

Section: State Of Thementioning

confidence: 99%

“…If we look at the most current works about convolutional neural networks (CNN) [20,22] applied to the TRDC task, we can observe that these networks are often used to produce "simplified images". These simplified images eliminate all types of noise present in the image and mark areas where the desired regions are present with some likelihood.…”

Section: Tandem: Convolutional Neural Network and Hidden Markov Modelsmentioning

confidence: 99%

See 1 more Smart Citation

Advances in Document Layout Analysis

Campos¹

View full text Add to dashboard Cite

Handwritten Text Segmentation (HTS) is a task within the Document Layout Analysis field that aims to detect and extract the different page regions of interest found in handwritten documents. HTS remains an active topic, that has gained importance with the years, due to the increasing demand to provide textual access to the myriads of handwritten document collections held by archives and libraries.This thesis considers HTS as a task that must be tackled in two specialized phases: detection and extraction. We see the detection phase fundamentally as a recognition problem that yields the vertical positions of each region of interest as a by-product. The extraction phase consists in calculating the best contour coordinates of the region using the position information provided by the detection phase.Our proposed detection approach allows us to attack both higher level regions: paragraphs, diagrams, etc., and lower level regions like text lines. In the case of text line detection we model the problem to ensure that the system's yielded vertical position approximates the fictitious line that connects the lower part of the grapheme bodies in a text line, commonly known as the baseline.

show abstract