The interpretation of handwritten sentences is carried out using a holistic approach in which both text image recognition and the interpretation itself are tightly integrated. Conventional approaches follow a serial, first-recognition then-interpretation scheme which cannot adequately use semantic–pragmatic knowledge to recover from recognition errors. Stochastic finite-sate transducers are shown to be suitable models for this integration, permitting a full exploitation of the final interpretation constraints. Continuous-density hidden Markov models are embedded in the edges of the transducer to account for lexical and morphological constraints. Robustness with respect to stroke vertical variability is achieved by integrating tangent vectors into the emission densities of these models. Experimental results are reported on a syntax-constrained interpretation task which show the effectiveness of the proposed approaches. These results are also shown to be comparatively better than those achieved with other conventional, N-gram-based techniques which do not take advantage of full integration.
Current research into spoken language translation (SLT), or speech-to-text translation, is often hampered by the lack of specific data resources for this task, as currently available SLT datasets are restricted to a limited set of language pairs. In this paper we present Europarl-ST, a novel multilingual SLT corpus containing paired audio-text samples for SLT from and into 6 European languages, for a total of 30 different translation directions. This corpus has been compiled using the debates held in the European Parliament in the period between 2008 and 2012. This paper describes the corpus creation process and presents a series of automatic speech recognition, machine translation and spoken language translation experiments that highlight the potential of this new resource. The corpus is released under a Creative Commons license and is freely accessible and downloadable.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.