Recently, end-to-end text spotting that aims to detect and recognize text from cluttered images simultaneously has received particularly growing interest in computer vision. Different from the existing approaches that formulate text detection as bounding box extraction or instance segmentation, we localize a set of points on the boundary of each text instance. With the representation of such boundary points, we establish a simple yet effective scheme for end-to-end text spotting, which can read the text of arbitrary shapes. Experiments on three challenging datasets, including ICDAR2015, TotalText and COCO-Text demonstrate that the proposed method consistently surpasses the state-of-the-art in both scene text detection and end-to-end text recognition tasks.
Scene text recognition (STR) is a challenging task that requires large-scale annotated data for training. However, collecting and labeling real text images is expensive and timeconsuming, which limits the availability of real data. Therefore, most existing STR methods resort to synthetic data, which may introduce domain discrepancy and degrade the performance of STR models. To alleviate this problem, recent semi-supervised STR methods exploit unlabeled real data by enforcing characterlevel consistency regularization between weakly and strongly augmented views of the same image. However, these methods neglect word-level consistency, which is crucial for sequence recognition tasks. This paper proposes a novel semi-supervised learning method for STR that incorporates word-level consistency regularization from both visual and semantic aspects. Specifically, we devise a shortest path alignment module to align the sequential visual features of different views and minimize their distance. Moreover, we adopt a reinforcement learning framework to optimize the semantic similarity of the predicted strings in the embedding space. We conduct extensive experiments on several standard and challenging STR benchmarks and demonstrate the superiority of our proposed method over existing semi-supervised STR methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.