“…Traditional OCRs are based on carefully crafted features, whereas in modern deep learning-based approaches, informative features are learned from a massive training dataset (e.g., Jaderberg, Vedaldi, & Zisserman, 2014;Zhou et al, 2017). Text detection in video streams, as opposed to text documents or still images, is a challenging task due to several factors, including motion, shadow, lighting, viewing angles, and complex backgrounds, all of which make detecting and extracting text from its background di cult (see Mancas & Gosseli, 2007, for a comprehensive review).…”