Recent advances in Handwritten Text Recognition and Document Layout Analysis have made it possible to convert digital images of manuscripts into electronic text. However, providing this text with the correct structure and context is still an open problem that needs to be solved to actually enable extracting the relevant information conveyed by the text. The most important structure needed for a set of text elements is their reading order. Most of the studies on the reading order problem are rule-based approaches and focus on printed documents. Much less attention has been paid so far to handwritten text documents, where the problem becomes particularly important—and challenging. In this work, we propose a new approach to automatically determine the reading order of text regions and lines in handwritten text documents. The task is approached as a sorting problem where the order-relation operator is automatically learned from examples. We experimentally demonstrate the effectiveness of our method on three different datasets at different hierarchical levels.
Document Layout Analysis (DLA) is a process that must be performed before attempting to recognize the content of handwritten musical scores by a modern automatic or semiautomatic system. DLA should provide the segmentation of the document image into semantically useful region types such as staff, lyrics, etc. We present a system that extend the ideas of DLA for handwritten text documents to perform region segmentation, region classification and baseline detection over handwritten musical scores in an integrated manner. Several experiments were carried out on two different datasets in order to validate this approach and assess it in different scenarios. Results show high accuracy in such complex manuscripts and very competent computational time, which is a good indicator of the scalability of the method for very large collections.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.