Rémy Mullot scite author profile

International audienceRecent progress in the digitization of heterogeneous collections of ancient documents has rekindled new challenges in information retrieval in digital libraries and document layout analysis. Therefore, in order to control the quality of historical document image digitization and to meet the need of a characterization of their content using intermediate level metadata (between image and document structure), we propose a fast automatic layout segmentation of old document images based on five descriptors. Those descriptors, based on the autocorrelation function, are obtained by multiresolution analysis and used afterwards in a specific clustering method. The method proposed in this article has the advantage that it is performed without any hypothesis on the document structure, either about the document model (physical structure), or the typographical parameters (logical structure). It is also parameter-free since it automatically adapts to the image content. In this paper, firstly, we detail our proposal to characterize the content of old documents by extracting the autocorrelation features in the different areas of a page and at several resolutions. Then, we show that is possible to automatically find the homogeneous regions defined by similar indices of autocorrelation without knowledge about the number of clusters using adapted hierarchical ascendant classification and consensus clustering approaches. To assess our method, we apply our algorithm on 316 old document images, which encompass six centuries (1200-1900) of French history, in order to demonstrate the performance of our proposal in terms of segmentation and characterization of heterogeneous corpus content. Moreover, we define a new evaluation metric, the homogeneity measure, which aims at evaluating the segmentation and characterization accuracy of our methodology. We find a 85% of mean homogeneity accuracy. Those results help to represent a document by a hierarchy of layout structure and content, and to define one or more signatures for each page, on the basis of a hierarchical representation of homogeneous blocks and their topology

show abstract

Symbol and character recognition: application to engineering drawings

Adam

Ogier

Cariou³

et al. 2000

International Journal on Document Analysis and Recognition

View full text Add to dashboard Cite

In this paper, we consider the general problem of technical document interpretation, applied to the documents of the French Telephonic Operator, France Telecom. More precisely, we focus the content of this paper on the computation of a new set of features allowing the classification of multi-oriented and multi-scaled patterns. This set of Invariant is based on the Fourier Mellin Transform. The interests of this computation rely on the excellent classification rate which is obtained with this method, and also on the possibility to use this Fourier Mellin transform within a "filtering mode", that permits to solve the well known difficult problem of connected character recognition.

show abstract

Texture feature evaluation for segmentation of historical document images

Mehri

Gomez‐Krämer

Héroux

et al. 2013

View full text Add to dashboard Cite

International audienceTexture feature analysis has undergone tremendous growth in recent years. It plays an important role for the analysis of many kinds of images. More recently, the use of texture analysis techniques for historical document image segmen-tation has become a logical and relevant choice in the conditions of significant document image degradation and in the context of lacking information on the document structure such as the document model and the typographical parameters. However, previous work in the use of texture analysis for segmentation of digitized historical document images has been limited to separately test one of the well-known texture-based approaches such as autocorrelation function, Grey Level Co-occurrence Matrix (GLCM), Gabor filters, gradient, wavelets, etc. In this paper we raise the question of which texture-based method could be better suited for discriminating on the one hand graphical regions from textual ones and on the other hand for separating textual regions with different sizes and fonts. The objective of this paper is to compare some of the well-known texture-based approaches: autocorrelation function, GLCM, and Gabor filters , used in a segmentation of digitized historical document images. Texture features are briefly described and quantitative results are obtained on simplified historical document images. The achieved results are very encouraging

show abstract

Texture feature benchmarking and evaluation for historical document image analysis

Mehri

Héroux

Gomez‐Krämer

et al. 2017

IJDAR

View full text Add to dashboard Cite

The use of different texture-based methods is pervasive in different sub-fields and tasks of document image analysis and particularly in historical document image analysis. Nevertheless, faced with a large diversity of texturebased methods used for historical document image analysis, few questions arise. Which texture methods are firstly well suited for segmenting graphical contents from textual ones, discriminating various text fonts and scales, and separating different types of graphics? Then, which texture-based method represents a constructive compromise between the performance and the computational cost? Thus, in this article a benchmarking of the most classical and widely used texture-based feature sets has been conducted using a classical texture-based pixel-labeling scheme on a large corpus of historical documents to have satisfactory and clear answers to the above questions. We focus on determining the performance of each texture-based feature set according to the document content. The results reported in this study provide firstly a qualitative measure of which texture-based feature sets are the most appropriate, and secondly a useful benchmark in terms of performance and computational cost for current and future research efforts in historical document image analysis.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Rémy Mullot

Document image characterization using a multiresolution analysis of the texture: application to old documents

Old document image segmentation using the autocorrelation function and multiresolution analysis

Symbol and character recognition: application to engineering drawings

Texture feature evaluation for segmentation of historical document images

Texture feature benchmarking and evaluation for historical document image analysis

Contact Info

Product

Resources

About