Abstract:Natural urban scene images contain many problems for character recognition such as luminance noise, varying font styles or cluttered backgrounds. Detecting and recognizing text in a natural scene is a difficult problem. Several techniques have been proposed to overcome these problems. These are, however, usually based on a bottom-up scheme, which provides a lot of false positives, false negatives and intensive computation. Therefore, an alternative, efficient, character-based expectancy-driven method is needed. This paper presents a modeling approach that is usable for expectancy-driven techniques based on the well-known SIFT algorithm. The produced models (Object Attention Patches) are evaluated in terms of their individual provisory character recognition performance. Subsequently, the trained patch models are used in preliminary experiments on text detection in scene images. The results show that our proposed model-based approach can be applied for a coherent SIFT-based text detection and recognition process.