Text Spotting can be used as an approach to retrieve information found in images that cannot be obtained otherwise, by performing text detection first and then recognizing the located text. Examples of images to apply this task on can be found in Tor network images, which contain information that may not be found in plain text. When comparing both stages, the latter performs worse due to the low resolution of the cropped areas among other problems. Focusing on the recognition part of the pipeline, we study the performance of five recognition approaches, based on state-ofthe-art neural network models, standalone OCR, and OCR enhancements. We complement them using string-matching techniques with two lexicons and compare computational time on five different datasets, including Tor network images. Our final proposal achieved 39,70% precision of text recognition in a custom dataset of images taken from Tor domains.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.