Towards Modelling an Attention-Based Text Localization Process

Clavelli, Antonio; Karatzas, Dìmosthenis; Lladós, Josep; Ferraro, Mario; Boccignone, Giuseppe

doi:10.1007/978-3-642-38628-2_35

Cited by 3 publications

(3 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, "textual objects" are a difficult task as opposed to faces for which, at least, efficient and effective face detectors do exist [110], if one is not concerned with the biological plausibility of the algorithm. Actually, our current research work is indeed addressed at verifying the suitability of our model in a difficult practical problem such as text localisation and detection "in the wild", in order to overcome present limitations of attentive-based approaches proposed within such realm [26]. To this end, we are adapting the model to handle time-varying images, and we are performing mobile eyetracking experiments outside the lab, in complex urban environment.…”

Section: Discussion and Final Remarksmentioning

confidence: 99%

“…But, more generally, the priority map could also be used to take into account contextual spatial modulation of visual attention [104]. We do not consider here this problem, but integrating contextual issues in our scheme is readily done (say, in the form P (L(t)|L(t − 1), r F (t − 1), Gist)), and it has been experimented for a text localisation task in urban street pictures using an earlier and simplified version of the model presented here [26].…”

Section: Moment-to-moment Scene Perception W(t)mentioning

confidence: 99%

“…The motivation for this choice is that Torralba's saliency well correlates with text appearance [90] and it can be used as a rough but reliable estimate of its likelihood P (F|O = text). Further, the main reason for using a simulated text likelihood estimator (instead of a real one such as in [26]) is that one can exploit ad-hoc control of the number of true positive / false positive regions. Having computed these coarse object-based maps it is easy to infer the initial priority map P (L| I LR ) [24] (Fig.…”

Section: Simulation: Gaze Shift Samplingmentioning

confidence: 99%

See 2 more Smart Citations

Modelling Task-Dependent Eye Guidance to Objects in Pictures

et al. 2014

Self Cite

View full text Add to dashboard Cite

We introduce a model of attentional eye guidance based on the rationale that the deployment of gaze is to be considered in the context of a general actionperception loop relying on two strictly intertwined processes: sensory processing, depending on current gaze position, identifies sources of information that are most valuable under the given task; motor processing links such information with the oculomotor act by sampling the next gaze position and thus performing the gaze shift. In such a framework, the choice of where to look next is task-dependent and oriented to classes of objects embedded within pictures of complex scenes. The dependence on task is taken into account by exploiting the value and the payoff of gazing at certain image patches or proto-objects that provide a sparse representation of the scene objects. The different levels of the actionperception loop are represented in probabilistic form and eventually give rise to a stochastic process that generates the gaze sequence. This way the model also accounts for statistical properties of gaze shifts such as individual scan path variability. Results of the simulations are compared either with experimental data

show abstract

Section: Discussion and Final Remarksmentioning

confidence: 99%

Section: Moment-to-moment Scene Perception W(t)mentioning

confidence: 99%

Section: Simulation: Gaze Shift Samplingmentioning

confidence: 99%

See 1 more Smart Citation

Modelling Task-Dependent Eye Guidance to Objects in Pictures

et al. 2014

Self Cite

View full text Add to dashboard Cite

show abstract

Worldly Eyes on Video: Learnt vs. Reactive Deployment of Attention to Dynamic Stimuli

Cuculo

D’Amelio

Grossi

et al. 2019

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Con-Text: Text Detection for Fine-Grained Object Classification

Karaoğlu

Tao

Gemert

et al. 2017

IEEE Trans. on Image Process.

View full text Add to dashboard Cite

This paper focuses on fine-grained object classification using recognized scene text in natural images. While the state-of-the-art relies on visual cues only, this paper is the first work which proposes to combine textual and visual cues. Another novelty is the textual cue extraction. Unlike the state-of-the-art text detection methods, we focus more on the background instead of text regions. Once text regions are detected, they are further processed by two methods to perform text recognition, i.e., ABBYY commercial OCR engine and a state-of-the-art character recognition algorithm. Then, to perform textual cue encoding, bi- and trigrams are formed between the recognized characters by considering the proposed spatial pairwise constraints. Finally, extracted visual and textual cues are combined for fine-grained classification. The proposed method is validated on four publicly available data sets: ICDAR03, ICDAR13, Con-Text, and Flickr-logo. We improve the state-of-the-art end-to-end character recognition by a large margin of 15% on ICDAR03. We show that textual cues are useful in addition to visual cues for fine-grained classification. We show that textual cues are also useful for logo retrieval. Adding textual cues outperforms visual- and textual-only in fine-grained classification (70.7% to 60.3%) and logo retrieval (57.4% to 54.8%).

show abstract

Towards Modelling an Attention-Based Text Localization Process

Cited by 3 publications

References 17 publications

Modelling Task-Dependent Eye Guidance to Objects in Pictures

Modelling Task-Dependent Eye Guidance to Objects in Pictures

Worldly Eyes on Video: Learnt vs. Reactive Deployment of Attention to Dynamic Stimuli

Con-Text: Text Detection for Fine-Grained Object Classification

Contact Info

Product

Resources

About