2002
DOI: 10.1007/3-540-45715-1_22
|View full text |Cite
|
Sign up to set email alerts
|

Formal Methods of Tokenization for Part-of-Speech Tagging

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2002
2002
2016
2016

Publication Types

Select...
5
2
1

Relationship

3
5

Authors

Journals

citations
Cited by 14 publications
(10 citation statements)
references
References 2 publications
0
10
0
Order By: Relevance
“…Pre-processing One of the most important prior tasks for robust part-ofspeech tagging is the correct tokenization or segmentation of the texts [56]. Arabic language has rich and complex morphology [44], [57], [58].…”
Section: A the Proposed System Architecturementioning
confidence: 99%
“…Pre-processing One of the most important prior tasks for robust part-ofspeech tagging is the correct tokenization or segmentation of the texts [56]. Arabic language has rich and complex morphology [44], [57], [58].…”
Section: A the Proposed System Architecturementioning
confidence: 99%
“…The problems of Spanish pre processing and segmentation have been studied in depth by Graña et al (2002). Their work presents a linguistically based pre processing segmenter system able to deal successfully with complex phenomena, such as multiword expressions, contractions, enclitic pronouns attached to verbs, and even segmentation ambiguities.…”
Section: Pre Processing and Text Segmentationmentioning
confidence: 99%
“…These tools need to take into account the inflectional morphology of each specific language, their irregularities and even their segmentation characteristics. This is the case, for example, of the work developed for both Spanish and Galician with MrTagoo tagger lemmatizer (Graña et al 2001;Graña et al 2002). Further, the forms of the user queries should be studied in non English languages to realize how users type their queries; whether, for example, they use the same terms in various inclinations with various endings.…”
Section: Conflationmentioning
confidence: 99%
“…This working hypothesis is not realistic due to the heterogeneous nature of the application texts and their sources. For this reason, we have developed a preprocessor module [4,1], an advanced tokenizer which performs the following tasks:…”
Section: The Preprocessormentioning
confidence: 99%