Page layout analysis is a fundamental step of any document image understanding system. We introduce an approach that segments text appearing in page margins (a.k.a side-notes text) from manuscripts with complex layout format. Simple and discriminative features are extracted in a connected-component level and subsequently robust feature vectors are generated. Multilayer perception classifier is exploited to classify connected components to the relevant class of text. A voting scheme is then applied to refine the resulting segmentation and produce the final classification. In contrast to state-of-the-art segmentation approaches, this method is independent of block segmentation, as well as pixel level analysis. The proposed method has been trained and tested on a dataset that contains a variety of complex side-notes layout formats, achieving a segmentation accuracy of about 95%.
Segmentation of curled textlines from warped document images is one of the major issues in document image dewarping. Most of the curled textlines segmentation algorithms present in the literature today are sensitive to the degree of curl, direction of curl, and spacing between adjacent lines. We present a new algorithm for curled textline segmentation which is robust to above mentioned problems at the expense of high execution time. We will demonstrate this insensitivity in a performance evaluation section. Our approach is based on the state-of-the-art image segmentation technique: Active Contour Model (Snake) with the novel idea of several baby snakes and their convergence in a vertical direction only. Experiment on publically available CBDAR 2007 document image dewarping contest dataset shows our textline segmentation algorithm accuracy of 97.96%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.