2022
DOI: 10.1017/s1351324922000134
|View full text |Cite
|
Sign up to set email alerts
|

Real-world sentence boundary detection using multitask learning: A case study on French

Abstract: We propose a novel approach for sentence boundary detection in text datasets in which boundaries are not evident (e.g., sentence fragments). Although detecting sentence boundaries without punctuation marks has rarely been explored in written text, current real-world textual data suffer from widespread lack of proper start/stop signaling. Herein, we annotate a dataset with linguistic information, such as parts of speech and named entity labels, to boost the sentence boundary detection task. Via experiments, we … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(4 citation statements)
references
References 20 publications
0
4
0
Order By: Relevance
“…The task of sentence segmentation can be performed by detecting sentence boundaries [33]. The general pattern of a sentence is that it begins with a capital letter and ends with a special punctuation mark such as a period, question mark, or exclamation mark.…”
Section: A Sentence Segmentationmentioning
confidence: 99%
“…The task of sentence segmentation can be performed by detecting sentence boundaries [33]. The general pattern of a sentence is that it begins with a capital letter and ends with a special punctuation mark such as a period, question mark, or exclamation mark.…”
Section: A Sentence Segmentationmentioning
confidence: 99%
“…Many studies of sentence boundary detection are for English, but they are rarely explored for languages other than English [5], [8] including Indonesian. Research for Indonesian has been done by [14] which presents the development of a training dataset to optimize sentence boundary detection using the Indonesian translation of the Al-Quran with F measure 86.4%, [6] using a rule base by looking for patterns of sentence endings based only on a combination of spaces, capital letters or quotation marks.…”
Section:  Issn: 2252-8938mentioning
confidence: 99%
“…[18] Using rule-based with 21 features and classification with k-means able to produce an average F1-score of 96.58%. [5] Proposed a multitasking neural model to detect sentence beginnings without relying on punctuation in written texts, obtaining an F1 score of up to 98.07%.…”
Section:  Issn: 2252-8938mentioning
confidence: 99%
See 1 more Smart Citation