2017
DOI: 10.1007/978-3-319-66984-7_10
|View full text |Cite
|
Sign up to set email alerts
|

Text Segmentation Techniques: A Critical Review

Abstract: Text segmentation is widely used for processing text. It is a method of splitting a document into smaller parts, which is usually called segments. Each segment has its relevant meaning. Those segments categorized as word, sentence, topic, phrase or any information unit depending on the task of the text analysis. This study presents various reasons of usage of text segmentation for different analyzing approaches. We categorized the types of documents and languages used. The main contribution of this study inclu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
13
0
1

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
4
1

Relationship

1
8

Authors

Journals

citations
Cited by 35 publications
(19 citation statements)
references
References 45 publications
0
13
0
1
Order By: Relevance
“…Segmentation includes methods that break a document down into independent and minimal textual components which are usually called segments or tokens. 62 A text segment is defined as a contiguous piece of text that is linked to itself but largely disconnected from the adjacent text. 63…”
Section: Definitions and Terminologymentioning
confidence: 99%
See 1 more Smart Citation
“…Segmentation includes methods that break a document down into independent and minimal textual components which are usually called segments or tokens. 62 A text segment is defined as a contiguous piece of text that is linked to itself but largely disconnected from the adjacent text. 63…”
Section: Definitions and Terminologymentioning
confidence: 99%
“…A token is an instance of a sequence of characters that are semantically grouped together. 64 Some literature, such as Pak and Teh, 62 considers tokenization as a sub function of segmentation. To some extent, we agree they overlap and could be used interchangeably, however in the context of this paper, we refer to tokenization and segmentation as two stages, as defined here.…”
Section: Definitions and Terminologymentioning
confidence: 99%
“…e structured data can be transformed into mathematical problems, and word segmentation is the first step in this transformation [14]. e basic unit of English text is a word [15]; therefore, standard word segmentation techniques are divided into three steps:…”
Section: Related Workmentioning
confidence: 99%
“…Tasks such as word tagging and tokenizing are done in many different languages, including Arabic [9], Hebrew [10], German [11], Urdu [12], Burmese [13], Russian [14], Chinese [15] and Swedish [16]. In other words, the process of text segmentation involved in these studies has been used in many different languages for text analysis [17].…”
Section: Introductionmentioning
confidence: 99%