In this study, we propose advancing all-neural speech recognition by directly incorporating attention modeling within the Connectionist Temporal Classification (CTC) framework. In particular, we derive new context vectors using time convolution features to model attention as part of the CTC network. To further improve attention modeling, we utilize content information extracted from a network representing an implicit language model. Finally, we introduce vector based attention weights that are applied on context vectors across both time and their individual components. We evaluate our system on a 3400 hours Microsoft Cortana voice assistant task and demonstrate that our proposed model consistently outperforms the baseline model achieving about 20% relative reduction in word error rates.
Any paper document when converted to electronic form through standard digitizing devices, like scanners, is subject to a small tilt or skew. Meanwhile, a de-skewed document allows a more compact representation of its components, particularly text objects, such as words, lines, and paragraphs, where they can be represented by their rectilinear bounding boxes. This simplified representation leads to more efficient, robust, as well as simpler algorithms for document image analysis including optical character recognition (OCR). This paper presents a new method for automatic detection of skew in a document image using mathematical morphology. The proposed algorithm is extremely fast as well as independent of script forms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.