2011 International Conference on Document Analysis and Recognition 2011
DOI: 10.1109/icdar.2011.270
|View full text |Cite
|
Sign up to set email alerts
|

Towards Searchable Digital Urdu Libraries - A Word Spotting Based Retrieval Approach

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
11
0

Year Published

2012
2012
2021
2021

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 22 publications
(11 citation statements)
references
References 12 publications
0
11
0
Order By: Relevance
“…Then, the location quality γ n is computed for each retrieved area A Ret,n in the order of their rank n that is provided by the word spotting system. Each time a relevant area is hit according to (3), this relevant area is deleted and cannot be hit again. Therefore, the best match according to the keyword spotter wins and subsequently retrieved areas cannot score if they do not overlap with another relevant area.…”
Section: Proposed Evaluation Measuresmentioning
confidence: 99%
See 2 more Smart Citations
“…Then, the location quality γ n is computed for each retrieved area A Ret,n in the order of their rank n that is provided by the word spotting system. Each time a relevant area is hit according to (3), this relevant area is deleted and cannot be hit again. Therefore, the best match according to the keyword spotter wins and subsequently retrieved areas cannot score if they do not overlap with another relevant area.…”
Section: Proposed Evaluation Measuresmentioning
confidence: 99%
“…Thus, α IA punishes the missing inner area (IA), i. e., the relevant area from the ground truth that is not retrieved, while α OA penalizes the outer area (OA), i. e., the area that is outside the corresponding relevant area from the ground truth. The parameter c > 0 determines the maximum size of the outer area for which the retrieved area is still considered to be a hit according to (3).…”
Section: Proposed Evaluation Measuresmentioning
confidence: 99%
See 1 more Smart Citation
“…The connected components are extracted in the binarized image of printed Urdu text by [10] to segment it into ligatures or partial words to which a set of two scalar and four vector features stored in the database represent.…”
Section: Segmentationmentioning
confidence: 99%
“…Despite the presence of some recent developments in layout analysis systems for Arabic and Urdu documents [9], the non-existence of commercial or opensource OCR techniques for these scripts make it difficult to navigate efficiently through scanned documents. Even the OCR-free technique presented by [7] can not be applied to these scripts due to highly non-uniform distribution of intra and inter word distances [10]. Moreover, lack of knowledge about location of the digits would make it impossible to differentiate between a ToC page and a page whose structure is similar to a ToC page.…”
mentioning
confidence: 99%