Snoopertrack: Text detection and tracking for outdoor videos

Minetto, Rodrigo; Thome, Nicolas; Cord, Matthieu; Leite, Neucimar J.; Stolfi, Jorge

doi:10.1109/icip.2011.6116563

Cited by 51 publications

(43 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Thus, it is more robust to background changes than methods that do not attempt to identify the text pixels. For example, [2], [6], [7] perform tracking by extracting features from all pixels in a bounding box, including background pixels. Moreover, SWT is rotationinvariant, and SIFT is robust to rotation, scale change and viewpoint change.…”

Section: A Identification Of Text Instancesmentioning

confidence: 99%

“…Thus, two instances of the same text, but with drastically different backgrounds, may be wrongly classified as two different text objects. In the second category, many features have been explored for text tracking: difference of intensity values [5], horizontal and vertical projection profiles of gradient magnitudes [2], motion vectors in the P-frames of MPEG videos [6] and Histogram of Oriented Gradients [7]. However, because these features are not robust to rotation and viewpoint change, it is difficult to extend them to handle complex text movements.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Recognition of Video Text through Temporal Integration

Phan

Shivakumara

Lü

et al. 2013

2013 12th International Conference on Document Analysis and Recognition

View full text Add to dashboard Cite

This paper presents a method for temporal integration, which can be used to improve the recognition accuracy of video texts. Given a word detected in a video frame, we use a combination of Stroke Width Transform and SIFT (Scale Invariant Feature Transform) to track it both backward and forward in time. The text instances within the word's framespan are then extracted and aligned at pixel level. In the second step, we integrate these instances into a text probability map. By thresholding this map, we obtain an initial binarization of the word. In the final step, the shapes of the characters are refined using the intensity values. This helps to preserve the distinctive character features (e.g., sharp edges and holes), which are useful for OCR engines to distinguish between the different character classes. Experiments on English and German videos show that the proposed method outperforms existing ones in terms of recognition accuracy.

show abstract

Section: A Identification Of Text Instancesmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Recognition of Video Text through Temporal Integration

Phan

Shivakumara

Lü

et al. 2013

2013 12th International Conference on Document Analysis and Recognition

View full text Add to dashboard Cite

show abstract

“…The motivating application for text classifiers such as T-HOG and R-HOG is the detection of text in photos and videos of arbitrary scenes [29,30]. Specifically, the idea is to use the classifier to filter the output of a fast but "permissive" (high-recall and moderate-precision) detector.…”

Section: T-hog As a Post-filter To Text Detectionmentioning

confidence: 99%

T-HOG: An effective gradient-based descriptor for single line text regions

Minetto

Thome

Cord

et al. 2013

Pattern Recognition

View full text Add to dashboard Cite

We discuss the use of histogram of oriented gradients (HOG) descriptors as an effective tool for text description and recognition. Specifically, we propose a HOG-based texture descriptor (T-HOG) that uses a partition of the image into overlapping horizontal cells with gradual boundaries, to characterize single-line texts in outdoor scenes. The input of our algorithm is a rectangular image presumed to contain a single line of text in Roman-like characters. The output is a relatively short descriptor, that provides an effective input to an SVM classifier. Extensive experiments show that the T-HOG is more accurate than Dalal and Triggs's original HOG-based classifier, for any descriptor size. In addition, we show that the T-HOG is an effective tool for text/non-text discrimination and can be used in various text detection applications. In particular, combining T-HOG with a permissive bottom-up text detector is shown to outperform state-of-the-art text detection systems in two major publicly available databases.

show abstract

“…This yields an extra speed-up that can be exploited in see-though applications (e.g. Augmented Reality translation [6], [7] and augmented documents [9]) or street-view navigation [5].…”

Section: Introductionmentioning

confidence: 99%

“…Camera based scene text analysis applications targeted specifically for mobile and wearable devices is an interesting area of research receiving increasing attention [1], [2], [3], [4], [5], [6], [7]. Although the newly arrived products in the mobile device market (2013) feature high definition cameras of up to 12 mega-pixel sensors, and powerful quad-core processors, they still have many limitations in comparison with standard desktop computers: e.g.…”

Section: Introductionmentioning

confidence: 99%

MSER-Based Real-Time Text Detection and Tracking

Gómez

Karatzas

2014

2014 22nd International Conference on Pattern Recognition

View full text Add to dashboard Cite

We present a hybrid algorithm for detection and tracking of text in natural scenes that goes beyond the fulldetection approaches in terms of time performance optimization. A state-of-the-art scene text detection module based on Maximally Stable Extremal Regions (MSER) is used to detect text asynchronously, while on a separate thread detected text objects are tracked by MSER propagation. The cooperation of these two modules yields real time video processing at high frame rates even on low-resource devices.

show abstract

Snoopertrack: Text detection and tracking for outdoor videos

Cited by 51 publications

References 14 publications

Recognition of Video Text through Temporal Integration

Recognition of Video Text through Temporal Integration

T-HOG: An effective gradient-based descriptor for single line text regions

MSER-Based Real-Time Text Detection and Tracking

Contact Info

Product

Resources

About