2022
DOI: 10.1109/access.2022.3151886
|View full text |Cite
|
Sign up to set email alerts
|

Character Detection and Segmentation of Historical Uchen Tibetan Documents in Complex Situations

Abstract: Tibetan is a low-resource language, and Tibetan culture carried by historical Tibetan documents is an important part of Chinese civilization. The study of historical Tibetan documents is of great significance to the protection of Tibetan culture and the promotion of Chinese culture. Character segmentation is an important step in image analysis and recognition of historical Tibetan documents. However, the following three challenges prevent solving problems of character segmentation in historical Tibetan documen… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 11 publications
(3 citation statements)
references
References 31 publications
0
3
0
Order By: Relevance
“…The Tibetan character ding datasets used in this paper are TCDB and HUTD [39] provided by Northwest Minzu University, in which the TCDB dataset contains the standard and variant data of Tibetan Uchen, and HUTD is the data of Tibetan antiquarian character ding, as shown in Table 1. This dataset is divided into three parts in total, which are Tibetan Uchen standard dataset, Tibetan Uchen variants dataset and Tibetan Uchen antiquarian character ding dataset.…”
Section: Experimental Results and Analysismentioning
confidence: 99%
“…The Tibetan character ding datasets used in this paper are TCDB and HUTD [39] provided by Northwest Minzu University, in which the TCDB dataset contains the standard and variant data of Tibetan Uchen, and HUTD is the data of Tibetan antiquarian character ding, as shown in Table 1. This dataset is divided into three parts in total, which are Tibetan Uchen standard dataset, Tibetan Uchen variants dataset and Tibetan Uchen antiquarian character ding dataset.…”
Section: Experimental Results and Analysismentioning
confidence: 99%
“…The research on document images of Historical Tibetan document in recent years has also achieved corresponding results: binarization processing, layout analysis, line segmentation, word segmentation, and character recognition. Zhao Penghai et al proposed a solution to the problem of pseudo-adhesion in the binarization method of Historical Tibetan document by fusing the attention mechanism with U-network, which has made an outstanding contribution to the construction of the basic dataset of Historical Tibetan document, and the obtained binarized dataset is very good for both background and line adhesion [22] . Zhang Ce et al proposed a structural attribute-based method for segmentation Tibetan characters in Ujjain script, which chunks syllables according to the position of tshegs and the position of the baseline in the syllable, and at the same time uses template matching for the differentiation and segmentation of the adhering strokes, and the method effectively solves the problem of segmentation the character dings in the Historical Tibetan document [23] .…”
Section: Related Workmentioning
confidence: 99%
“…However, some research has been dedicated to detecting and recognizing other ethnic characters. In the detection and recognition of the Tibetan character, Zhang et al [18] proposed an innovative Tibetan character segmentation approach grounded in critical feature information. The method effectively tackled challenges such as text lines with varying degrees of tilt and distortion, overlapping and intersecting character strokes, and the diversity in stroke styles.…”
Section: Tibetan Character Detection and Recognitionmentioning
confidence: 99%