Cataloged from PDF version of article.Motivated by the need for the automatic\ud indexing and analysis of huge number of documents in\ud Ottoman divan poetry, and for discovering new knowledge\ud to preserve and make alive this heritage, in this study we\ud propose a novel method for segmenting and retrieving\ud words in Ottoman divans. Documents in Ottoman are dif-\ud ficult to segment into words without a prior knowledge of\ud the word. In this study, using the idea that divans have\ud multiple copies (versions) by different writers in different\ud writing styles, and word segmentation in some of those\ud versions may be relatively easier to achieve than in other\ud versions, segmentation of the versions (which are difficult,\ud if not impossible, with traditional techniques) is performed\ud using information carried from the simpler version. One\ud version of a document is used as the source dataset and the\ud other version of the same document is used as the target\ud dataset. Words in the source dataset are automatically\ud extracted and used as queries to be spotted in the target\ud dataset for detecting word boundaries. We present the idea\ud of cross-document word matching for a novel task of\ud segmenting historical documents into words. We propose a\ud matching scheme based on possible combinations of\ud sequence of sub-words. We improve the performance of\ud simple features through considering the words in a context.\ud The method is applied on two versions of Layla and\ud Majnun divan by Fuzuli. The results show that, the proposed\ud word-matching-based segmentation method is\ud promising in finding the word boundaries and in retrieving\ud the words across documents
Abstract-Repeated patterns, rhymes and redifs, are among the fundamental building blocks of Ottoman Divan poetry. They provide integrity of a poem by connecting its parts and bring a melody to its voice. In Ottoman literature, poets wrote their works by making use of the rhymes and redifs of previous poems according to the nazire (creative imitation) tradition either to prove their expertise or to show respect towards old masters. Automatic recognition of redifs would provide important data mining opportunities in literary analyses of Ottoman poetry where the majority of it is in handwritten form. In this study, we propose a matching criterion and method, Redif Extraction using Contour Segments (RECS) using the proposed matching criterion, that detects redifs in handwritten Ottoman literary texts using only visual analysis. Our method provides a success rate of 0.682 in a test collection of 100 poems.
Millions of manuscripts and printed texts are available in the Ottoman language. The automatic categorization of Ottoman texts would make these documents much more accessible in various applications ranging from historical investigations to literary analyses. In this work, we use transcribed version of Ottoman literary texts in the Latin alphabet and show that it is possible to develop effective Automatic Text Categorization techniques that can be applied to the Ottoman language. For this purpose, we use two fundamentally different machine learning methods: Naïve Bayes and Support Vector Machines, and employ four style markers: most frequent words, token lengths, two-word collocations, and type lengths. In the experiments, we use the collected works (divans) of ten different poets: two poets from five different hundred-year periods ranging from the 15th to 19th century. The experimental results show that it is possible to obtain highly accurate classifications in terms of poet and time period. By using statistical analysis we are able to recommend which style marker and machine learning method are to be used in future studies.
In this study, we address the problem of matching patterns in Kufic calligraphy images. Being used as a decorative element, Kufic images have been designed in a way that makes it difficult to be read by non-experts. Therefore, available methods for handwriting recognition are not easily applicable to the recognition of Kufic patterns. In this study, we propose two new methods for Kufic pattern matching. The first method approximates the contours of connected components into lines and then utilizes chain code representation. Sequence matching techniques with a penalty for gaps are exploited for handling the variations between different instances of sub-patterns. In the second method, skeletons of connected components are represented as a graph where junction and end points are considered as nodes. Graph isomorphism techniques are then relaxed for partial graph matching. Methods are evaluated over a collection of 270 square Kufic images with 8,941 sub-patterns. Experimental results indicate that, besides retrieval and indexing of known patterns, our method also allows the discovery of new patterns.
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.