Hundreds of text detection methods have been proposed, motivated by their widespread use in several applications. Despite the huge progress in the area, which includes even the use of sophisticated learning schemes, ad-hoc post-processing procedures are often employed to improve the text detection rate, by removing both false positives and negatives. Another issue refers to the lack of the use of the complementary views provided by different text detection methods. This paper aims to fill these gaps. We propose the use of a soft computing framework, based on genetic programming (GP), to guide the definition of suitable post-processing procedures through the combination of basic operators, which may be applied to improve detection results provided by multiple methods at the same time. Performed experiments in the widely used ICDAR 2011, ICDAR 2013, and ICDAR 2015 datasets demonstrate that our GP-based approach leads to F1 effectiveness gains up to 5.1 percentage points, when compared to several baselines. INDEX TERMS Scene text detection, multi-oriented text, convolutional neural network, data fusion, genetic programming.
Stereo vision is a growing topic in computer vision due to the innumerable opportunities and applications this technology offers for the development of modern solutions, such as virtual and augmented reality applications. To enhance the user's experience in threedimensional virtual environments, the motion parallax estimation is a promising technique to achieve this objective. In this paper, we propose an algorithm for generating parallax motion effects from a single image, taking advantage of state-of-the-art instance segmentation and depth estimation approaches. This work also presents a comparison against such algorithms to investigate the trade-off between efficiency and quality of the parallax motion effects, taking into consideration a multi-task learning network capable of estimating instance segmentation and depth estimation at once. Experimental results and visual quality assessment indicate that the PyD-Net network (depth estimation) combined with Mask R-CNN or FBNet networks (instance segmentation) can produce parallax motion effects with good visual quality.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.