Arabic/Latin and Machine-printed/Handwritten Word Discrimination using HOG-based Shape Descriptor

Saïdani, Asma; Kacem, Afef; Belaïd, Abdel

doi:10.5565/rev/elcvia.762

Cited by 19 publications

(3 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Some approaches, however, compute a combination of both statistical and structural features [80], [81]. Other approaches have used the histogram of oriented gradients as a descriptor [6], [82]. The pyramid histogram of oriented gradients has also been extracted [7].…”

Section: ) Feature Extractionmentioning

confidence: 99%

Deep Sparse Auto-Encoder Features Learning for Arabic Text Recognition

et al. 2021

View full text Add to dashboard Cite

One of the most recent challenging issues of pattern recognition and artificial intelligence is Arabic text recognition. This research topic is still a pervasive and unaddressed research field, because of several factors. Complications arise due to the cursive nature of the Arabic writing, character similarities, unlimited vocabulary, use of multi-size and mixed-fonts, etc. To handle these challenges, an automatic Arabic text recognition requires building a robust system by computing discriminative features and applying a rigorous classifier together to achieve an improved performance. In this work, we introduce a new deep learning based system that recognizes Arabic text contained in images. We propose a novel hybrid network, combining a Bag-of-Feature (BoF) framework for feature extraction based on a deep Sparse Auto-Encoder (SAE), and Hidden Markov Models (HMMs), for sequence recognition. Our proposed system, termed BoF-deep SAE-HMM, is tested on four datasets, namely the printed Arabic line images Printed KHATT (P-KHATT), the benchmark printed word images Arabic Printed Text Image (APTI), the benchmark handwritten Arabic word images IFN/ENIT, and the benchmark handwritten digits images Modified National Institute of Standards and Technology (MNIST).

show abstract

Section: ) Feature Extractionmentioning

confidence: 99%

Deep Sparse Auto-Encoder Features Learning for Arabic Text Recognition

et al. 2021

View full text Add to dashboard Cite

show abstract

“…Newell et al (2011) have extended the HOG descriptor to include features at multiple scales for character recognition. Saidani et al (2015) have proposed a novel approach for Arabic and Latin script identification based on Histogram of Oriented Gradients feature descriptors. HOG is first applied at word level based on writing orientation analysis.…”

Section: Introductionmentioning

confidence: 99%

Scale Space Co-Occurrence HOG Features for Word Spotting in Handwritten Document Images

Prabhakar

2016

International Journal of Computer Vision and Image Processing

View full text Add to dashboard Cite

In this paper, the authors proposed a Scale Space Co-occurrence Histograms of Oriented Gradients method (SS Co-HOG) for retrieving words from digitized handwritten documents. The poor performance of HOG based word spotting in handwritten documents is due to that HOG ignores spatial information of neighboring pixels whereas Co-HOG captures the spatial information of neighboring pixels through counting the occurrence of the gradient orientations of two or more neighboring pixels. The authors employed three scale parameter representation of an image and at each scale, they divide the word image into blocks and Co-HOG features are extracted from each block and finally concatenate them into form a feature descriptor. The proposed method is evaluated using precision and recall metrics through experimentation conducted on popular datasets such as IAM and GW and confirmed that their method outperforms for both the datasets.

show abstract

“…In [10] have proposed Arabic and Latin script identification based on Histogram of Oriented Gradients feature descriptors. In [11] have proposed a unsupervised segmentation word spotting method based on grid of HOG descriptors, and a sliding-window approach is used to locate the document regions that are most similar to the query.…”

Section: Related Workmentioning

confidence: 99%

Segmentation Based Word Spotting Method for Handwritten Documents

Prabhakar

2017

IJARCSSE

View full text Add to dashboard Cite

Abstract-In this paper, we present a segmentation based word spotting method for handwritten document images using Co-occurrence Histograms of Oriented Gradients (Co-HOG) descriptor. The drawback of Histogram of Oriented Gradients (HOG) is that HOG ignores spatial information of adjacent pixels where as the Co-HOG take into account spatial contextual information by capturing the co-occurrence of orientation pairs of neighbouring pixels. In order to construct Co-HOG descriptor for word spotting, we divide a word image into blocks and Co-HOG features are extracted from each block and finally concatenate them to form a feature descriptor. The proposed method is evaluated using precision and recall metrics through experimentation conducted on popular GW dataset and confirmed that our method outperforms for this dataset.Keywords-Word spotting, Character Recognition, George Washington, Dynamic Time Warping, Hidden Markov Models I. INTRODUCTIONRecently Document Image Analysis is become one of dynamic research field which draws an attention of researcher due to its complexity and growing requirement for accessing the content of digitized information. Optical Character Recognition (OCR) has been explored for a few decades with massive accomplishment which facilitates to automate human procedure. OCR techniques usually recognize words by processing fonts independently and works well with machine printed fonts against clean environment. Generally, big quantity of document images are accumulated in digital libraries, and processing of these documents with the help of OCR requires high computation rate due to difficulty involved in understanding the page layout of digitized documents, irregular writing manner, dull ink, stained paper and other adverse factors. In order to overcome these problems, researchers have proposed a method called word spotting. Word spotting method is a moderately new alternative for text recognition and retrieval in digitized printed and handwritten documents.Handwritten word spotting is the pattern classification mission which consists of detecting given query word in handwritten document images. The word spotting in handwritten documents is not completely solved due to various challenges posed by handwritten documents. Hence, we focused on word spotting in handwritten documents rather than printed documents. Generally, a word spotting method consists of three main modules: pre-processing, feature extraction and feature matching. Among them, feature extraction is one of most important factors for achieving high retrieval performance, because of feature with strong discriminative information can be well classified even using with simplest classifier.The literature investigation exposes that HOG descriptor is extensively used in numerous recognition applications because of its discriminative capability compared to other existing feature descriptors. The HOG descriptor is developed by Dalal Importantly, HOG considers orientation of only isolated pixels, whereas spatial information of adjacent pixels i...

show abstract

Arabic/Latin and Machine-printed/Handwritten Word Discrimination using HOG-based Shape Descriptor

Cited by 19 publications

References 31 publications

Deep Sparse Auto-Encoder Features Learning for Arabic Text Recognition

Deep Sparse Auto-Encoder Features Learning for Arabic Text Recognition

Scale Space Co-Occurrence HOG Features for Word Spotting in Handwritten Document Images

Segmentation Based Word Spotting Method for Handwritten Documents

Contact Info

Product

Resources

About