2022
DOI: 10.1145/3527663
|View full text |Cite
|
Sign up to set email alerts
|

Sentence Boundary Disambiguation for Tibetan Based on Attention Mechanism at the Syllable Level

Abstract: Tibetan is a low-resource language with few existing electronic reference materials. The goal of Tibetan sentence boundary disambiguation (SBD) is to segment long text into sentences, and it is the foundation for downstream tasks corpora building. This study implemented the Tibetan SBD at the syllable level to avoid word segmentation (WS) errors affecting the accuracy of SBD. Specifically, the attention mechanism is introduced based on a recurrent neural network (RNN) to study Tibetan SBD. The primary objectiv… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 12 publications
0
1
0
Order By: Relevance
“…One example of this is the Tibetan punctuation mark "།" (ཤད shad), which can be used after a word, phrase, or sentence. As a result, it can be unclear whether a shad is meant to indicate the end of a sentence or not (Li et al, 2022).…”
Section: Sentence Boundaries and Punctuationmentioning
confidence: 99%
“…One example of this is the Tibetan punctuation mark "།" (ཤད shad), which can be used after a word, phrase, or sentence. As a result, it can be unclear whether a shad is meant to indicate the end of a sentence or not (Li et al, 2022).…”
Section: Sentence Boundaries and Punctuationmentioning
confidence: 99%