One of the cultural heritage which are now starting to be forgotten is the papyrus library. Balinese papyrus is one of the media to write the ideas from minstrels in ancient times. Currently, many of ancient literature that written in papyrus very difficult to identify because the writings were beginning to rot or fade influenced by age. The introduction of Balinese scripts on papyrus can be done first by performing papyrus digitalization. The papyruses are scanned to become one image file. Further the papyrus image is done by Thresholding since prior to the thinning process it is required a binary image. The application of Thinning Zhang-Suen method is very effective because from the original image with 2 sub iteration until yielding in 1 pixel. The benefits of this research is to improve the quality of the image and further segmented to read papyrus making it easier to read text on the papyrus.
This paper proposed LONTAR_DETC, a method to detect handwritten Balinese characters in Lontar manuscripts. LONTAR_DETC is a deep learning architecture based on YOLO. The detection of Balinese characters in Lontar manuscripts is challenging due to the characteristics of Balinese characters in Lontar manuscripts. Balinese characters in Lontar manuscripts are dense, overlapping, have high variance, contain noise, and classes of these characters are imbalanced. The proposed method consists of three steps, namely data generation, Lontar manuscript annotation, and Balinese character detection. The first step is data generation, in which synthetic images of original Lontar manuscript images are generated with enhanced image quality. The second step is data annotation to build a new Lontar manuscript dataset. As a result, we also propose the Handwritten Balinese Character of Lontar manuscript (HBCL_DETC) dataset, a novel Balinese character in Lontar manuscripts dataset. HBCL_DETC contains 600 images that consists of more than 100,000 Balinese characters annotated by experts. Finally, the third step is training the YOLOv4 detection model using the HBCL_DETC dataset. We created this dataset specifically for the task of detecting Balinese characters in Lontar manuscripts. To evaluate the reliability of the dataset, we experimented with three scenarios. In the first scenario, the detection model was trained using original images of Lontar manuscripts, in the second scenario the detection model was trained with the addition of augmented grayscale images, and in the third scenario the detection model was trained using HBCL_DETC. Based on the experimental results, LONTAR_DETC can detect Balinese characters with high detection rate with mAP of 99.55%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.