Bai nationality has a long history and has its own language. Limited by the fact that there are fewer and fewer people who know the Bai language, the literature and culture of the Bai nationality begin to lose rapidly. In order to make the people who do not understand Bai characters can also read the ancient books of Bai nationality, this paper is based on the research of high-precision single character recognition model of Bai characters. First, with the help of Bai culture lovers and related scholars, we have constructed a data set of Bai characters, but limited by the need of expert knowledge, so the data set is limited in size. As a result, deep learning models with the nature of data hunger cannot get an ideal accuracy. In order to solve this issue, we propose to use the Chinese data set which also belongs to Sino-Tibetan language family to improve the recognition accuracy of Bai characters through transfer learning. In addition, we propose four transfer learning approaches: Direct Knowledge Transfer (DKT), Indirect Knowledge Transfer (IKT), Self-coding Knowledge Transfer (SCKT), and Self-supervised Knowledge Transfer (SSKT). Experiments show that our approaches greatly improve the recognition accuracy of Bai characters.
The Bai People have left behind a wealth of ancient texts that record their splendid civilization, unfortunately fewer and fewer people can read these texts in the present time. Therefore, it is of great practical value to design a model that can automatically recognize the Bai ancient (offset) texts. However, due to the expert knowledge involved in the annotation of ancient (offset) texts, and its limited scale, we consider that using handwritten Bai texts to help identify ancient (offset) Bai texts for handwritten Bai texts can be easily obtained and annotated. Essentially, this is a problem of domain adaptation, and some of the domain adaptation methods were transplanted to handle ancient (offset) Bai text recognition. Unfortunately, none of them succeeded in obtaining a high performance due to the fact that they do not solve the problem of how to separate the style and content information of an image. To address this, an information separation network (ISN) that can effectively separate content and style information and eventually classify with content features only, is proposed. Specifically, our network first divides the visual features into a style feature and a content feature by a separator, and ensures that the style feature contains only style and the content feature contains only content by cross-domain cross-reconstruction; thus, achieving the separation of style and content, and finally using only the content feature for classification. This greatly reduces the impact brought by cross-domain. The proposed method achieves leading results on five public datasets and a private one.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.