ICDAR 2021 Competition on Scene Video Text Spotting

Cheng, Zhanzhan; Lü, Jing; Zou, Baorui; Zhou, Shuigeng; Wu, Fei

doi:10.1007/978-3-030-86337-1_43

Cited by 4 publications

(3 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Similar to Cheng et al [3], we argue that recognizing text in each frame is too computationally expensive and might introduce erroneous recognitions. However, in contrast to Cheng et al, we argue that the computational cost of STE in video is mainly driven by the detection module.…”

Section: Hytextsupporting

confidence: 76%

“…The aforementioned method has been criticized by Cheng et al [3]. They argue that reading text in every frame is excessively computationally costly and thus operationally unsuitable.…”

Section: Related Workmentioning

confidence: 99%

“…The next step is to extract the cor- rect string from the streams. In order to do this, we introduce a hybrid approach that does neither entirely depend on majority voting, nor on text selection such as in YORO [3]. We argue that unfiltered majority voting may introduce flawed text recognitions, which can be easily filtered out.…”

Section: Text Stream Recognitionmentioning

confidence: 99%

See 2 more Smart Citations

HyText – A Scene-Text Extraction Method for Video Retrieval

Theus

Rossetto

Bernstein

2022

MultiMedia Modeling

View full text Add to dashboard Cite

Scene-text has been shown to be an effective query target for video retrieval applications in a known-item search context. While much progress has been made in scene-text extraction from individual pictures, the special case of video has so far received less attention. This paper introduces HyText, a scene-text extraction method for video with a focus on retrieval applications. HyText uses intermittent scene-text detection in combination with bi-directional tracking in order to increase throughput without reducing detection accuracy.

show abstract

Section: Hytextsupporting

confidence: 76%

“…The aforementioned method has been criticized by Cheng et al [3]. They argue that reading text in every frame is excessively computationally costly and thus operationally unsuitable.…”

Section: Related Workmentioning

confidence: 99%