2019
DOI: 10.1007/s10032-019-00336-x
|View full text |Cite
|
Sign up to set email alerts
|

HWNet v2: an efficient word image representation for handwritten documents

Abstract: We present a framework for learning an efficient holistic representation for handwritten word images. The proposed method uses a deep convolutional neural network with traditional classification loss. The major strengths of our work lie in: (i) the efficient usage of synthetic data to pre-train a deep network, (ii) an adapted version of the ResNet-34 architecture with the region of interest pooling (referred to as HWNet v2) which learns discriminative features for variable sized word images, and (iii) a realis… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
40
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 63 publications
(40 citation statements)
references
References 80 publications
0
40
0
Order By: Relevance
“…As shown in Figure 5, different kind of Telugu word images like occlusion affected, missing segment, noisy effected, random distortion and missing segment with random distorted images are considered as a query word images. Assessment of proposed TWIR system using DL-CNN is done by computing mean average precision (mAP) and mean average recall (mAR) and compared with the conventional TWIR systems like SIFT-BoVW [14], HMM-C [16], SURF-BoVW [17], GLCM-IPC [18], HWNET v2 [19] and SDM-NSCT [21]. As discussed earlier, simulation analysis is done with several kind of Telugu word images and obtained enhanced mAP and mAR even when the query word images had a kind of unwanted information which might be introduced automatically while acquiring them or manually by a human or even by a printing machine during the scanning procedure.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…As shown in Figure 5, different kind of Telugu word images like occlusion affected, missing segment, noisy effected, random distortion and missing segment with random distorted images are considered as a query word images. Assessment of proposed TWIR system using DL-CNN is done by computing mean average precision (mAP) and mean average recall (mAR) and compared with the conventional TWIR systems like SIFT-BoVW [14], HMM-C [16], SURF-BoVW [17], GLCM-IPC [18], HWNET v2 [19] and SDM-NSCT [21]. As discussed earlier, simulation analysis is done with several kind of Telugu word images and obtained enhanced mAP and mAR even when the query word images had a kind of unwanted information which might be introduced automatically while acquiring them or manually by a human or even by a printing machine during the scanning procedure.…”
Section: Resultsmentioning
confidence: 99%
“…Recently, an efficient approach for Telugu script recognition and retrieval using transformation-based methodology is proposed in [21], which utilized missing segment, noisy, corrupted and occlusion effected word images as a query input, also deliberated multi conjunct vowel consonant gathered word images for showing the robustness of proposed algorithm. However, extraction of features from the word image plays a significant role in retrieval system, which is quite hard in [19][20][21] and even in other word image retrieval systems in literature. Thus, a DL-CNN with principal component analysis based pair wise hamming distance (PCA-H) is employed to extract the maximum features from the given word image where the extracted feature map represents the word image more effectively and assists for retrieving more relevant word images even with larger databases.…”
Section: Introductionmentioning
confidence: 99%
“…These uneven lines are typically caused by uncontrolled oscillations of the hand muscles, dampened by inertia [33]. In handwriting recognition, elastic distortion augmentation is a way to simulate these oscillations [17,33]. We found that although elastic distortion augmentation improves classification results, it has a negative effect on localization accuracy.…”
Section: Error Analysis and Future Workmentioning
confidence: 87%
“…Image Features: We use the pre-final layer representations from deep neural networks trained to classify word-images. Such representations capture the discriminatory information between different word-images and have demonstrated success in embedding similar images together [14]. The activation for an image can be considered as a compact representation in a continuous space.…”
Section: A Features For Clusteringmentioning
confidence: 99%
“…The CNN based feature extractor used in our experiments follows the architecture described in Krishnan and Jawahar [14]. The network was initially trained on synthetic handwritten word images and later fine-tuned on a real-world corpus.…”
Section: A Our Pipelinementioning
confidence: 99%