In this paper, we present a new text line extraction method for handwritten Arabic documents. The proposed technique is based on a generalized adaptive local connectivity map (ALCM) using a steerable directional filter. The algorithm is designed to solve the particularly complex problems seen in handwritten documents such as fluctuating, touching or crossing text lines. The proposed algorithm consists of three steps. Firstly, a steerable filter is used to probe and determine foreground intensity along multiple directions at each pixel while generating the ALCM. The ALCM is then binarized using an adaptive thresholding algorithm to get a rough estimate of the location of the text lines. In the second step, connected component analysis is used to classify text and non text patterns in the generated ALCM to refine the location of the text lines. Finally, the text lines are separated by superimposing the text line patterns in the ALCM on the original document image and extracting the connected components covered by the pattern mask. Analysis of experimental results on the DARPA MADCAT Arabic handwritten document data indicate that the method is robust and is capable of correctly isolating handwritten text lines even on challenging document images.
<div><i>ChemML</i> is an open machine learning and informatics program suite that is designed to support and advance the data-driven research paradigm that is currently emerging in the chemical and materials domain. <i>ChemML</i> allows its users to perform various data science tasks and execute machine learning workflows that are adapted specifically for the chemical and materials context. Key features are automation, general-purpose utility, versatility, and user-friendliness in order to make the application of modern data science a viable and widely accessible proposition in the broader chemistry and materials community. <i>ChemML</i> is also designed to facilitate methodological innovation, and it is one of the cornerstones of the software ecosystem for data-driven <i>in silico</i> research outlined in our recent publication<sup>1</sup>.</div>
ChemML is an open machine learning (ML) and informatics program suite that is designed to support and advance the data‐driven research paradigm that is currently emerging in the chemical and materials domain. ChemML allows its users to perform various data science tasks and execute ML workflows that are adapted specifically for the chemical and materials context. Key features are automation, general‐purpose utility, versatility, and user‐friendliness in order to make the application of modern data science a viable and widely accessible proposition in the broader chemistry and materials community. ChemML is also designed to facilitate methodological innovation, and it is one of the cornerstones of the software ecosystem for data‐driven in silico research.
This article is categorized under:
Software > Simulation Methods
Computer and Information Science > Chemoinformatics
Structure and Mechanism > Computational Materials Science
Software > Molecular Modeling
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.