There are printed artistic documents where text lines of a single page may not be parallel to each other. These text lines may have different orientations or the text lines may be curved shapes. For the optical character recognition (OCR) of these documents, we need to extract such lines properly. In this paper, we propose a novel scheme, mainly based on the concept of water reservoir analogy, to extract individual text lines from printed Indian documents containing multioriented and/or curve text lines. A reservoir is a metaphor to illustrate the cavity region of a character where water can be stored. In the proposed scheme, at first, connected components are labeled and identified either as isolated or touching. Next, each touching component is classified either straight type (S-type) or curve type (C-type), depending on the reservoir base-area and envelope points of the component. Based on the type (S-type or C-type) of a component two candidate points are computed from each touching component. Finally, candidate regions (neighborhoods of the candidate points) of the candidate points of each component are detected and after analyzing these candidate regions, components are grouped to get individual text lines.
Text/Graphics separation in document image analysis is one of the main concerns in present research work. The complexity enhances when both text and graphics overlap in the context of Maps in color images. This paper discusses a number of improvements to text/graphics separation methods to make it suitable for Maps. Emphasize is given to the overlapping regions of text and graphics. It also discusses a method of color separation using clustering method for the purpose of text/graphics separation.
Bengali, one of the official languages of the Indian subcontinent, is composed of 50 alphabets, of which 11 are vowels and 39 consonants. In addition, Bengali words are formed from compound characters and modifiers. Compound characters are formed by combining parts of single characters and modifiers are parts of vowels and consonants which make sense only when adjacent to or attached with a letter. In this paper, features of Bengali characters are studied using a hierarchical structure. The first few layers deal with features that broadly classify the characters into small size groups. The lower level features are more specific to each character within a group. Higher level features can be identified based on pixel density and arrangement, while the lower level features have been identified using a chain code technique. The algorithm progresses successively through each group in the hierarchy until it finds a match with the input character.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.