1996
DOI: 10.1093/llc/11.4.193
|View full text |Cite
|
Sign up to set email alerts
|

Automatic morphological analysis of Basque

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
17
0
2

Year Published

2002
2002
2016
2016

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 35 publications
(19 citation statements)
references
References 14 publications
0
17
0
2
Order By: Relevance
“…The preprocessing pipeline takes raw texts and applies a series of Basque linguistic processors to analyse the texts: i) A morphological analyser that performs word segmentation and PoS tagging (Alegria et al, 1996), ii) A lemmatiser that resolves the ambiguity caused at the previous phase (Alegria et al, 68 2002), iii) A multi-word item identifier that determines which groups of two or more words are to be considered multi-word expressions (Alegria et al, 2004), iv) A named-entity recogniser that identifies and classifies named entities (person, organisation, location) in the text (Alegria et al, 2003), v) A chunker, an analyser that identifies verbal and nominal chunks based on rule-based grammars (Aduriz and Díaz de Ilarraza, 2003), vi) A clause tagger, that is, an analyser that identifies clauses, combining rulebased-grammars and machine learning techniques (Alegria et al, 2008).…”
Section: Preprocessing and Mention Detectionmentioning
confidence: 99%
See 1 more Smart Citation
“…The preprocessing pipeline takes raw texts and applies a series of Basque linguistic processors to analyse the texts: i) A morphological analyser that performs word segmentation and PoS tagging (Alegria et al, 1996), ii) A lemmatiser that resolves the ambiguity caused at the previous phase (Alegria et al, 68 2002), iii) A multi-word item identifier that determines which groups of two or more words are to be considered multi-word expressions (Alegria et al, 2004), iv) A named-entity recogniser that identifies and classifies named entities (person, organisation, location) in the text (Alegria et al, 2003), v) A chunker, an analyser that identifies verbal and nominal chunks based on rule-based grammars (Aduriz and Díaz de Ilarraza, 2003), vi) A clause tagger, that is, an analyser that identifies clauses, combining rulebased-grammars and machine learning techniques (Alegria et al, 2008).…”
Section: Preprocessing and Mention Detectionmentioning
confidence: 99%
“…It is an agglutinative, head-final, pro-drop, free-word order language (Laka, 1996). Naturally, the Basque language has also inspired a lot of work in Computational Linguistics with tools for automatically processing it becoming increasingly available (Alegria et al, 1996;Alegria et al, 2002;Alegria et al, 2003;Aduriz and Díaz de Ilarraza, 2003;Alegria et al, 2008). However, as it is the case with most less-resourced languages, there are tools for the core processing levels, such as tokenisation, sentence splitting, morphological analysis, syntactic parsing/chunking, but much less so for higher semantic levels required in end goal applications such as Question Answering (Morton, 2000), Text Summarisation (Steinberger et al, 2007) or Information Extraction (Def, 1995;Hirschman, 1998).…”
Section: Introductionmentioning
confidence: 99%
“…Basque is an agglutinative language with a special morpho-syntactic structure inside the words [4] that may lead to intractable vocabularies of words for a CSR when the size of task is large. A first approach to the problem is to use morphemes instead of words in the system in order to define the system vocabulary [5].…”
Section: Morphological Features Of Basquementioning
confidence: 99%
“…Simplified sample of the output of the Transcriber free tool [10] enriched with morpho-syntactic information of Basque <Sync time="333.439"/> +horretarako /hortarako/<Word lemma="hori" POS="ADB"/> +denok /danok/<Word lemma="dena" POS="IZL"/> lagundu<Word lemma="lagundu" POS="ADI"/> behar<Word lemma="behar" POS="ADI"/> dugu<Word lemma="*ukan" POS="ADL"/> . </Turn> <Turn mode="spontaneous" fidelity="high" startTime="335.182" endTime="336.065"> <Sync time="335.182"/> ^Batasunak<Word lemma="9batasuna" POS="IZB"/> As Basque is an agglutinative language with very rich inflection variety [4], Basque XML files include morphologic information such as each word's lemma and Part-Of-Speech tag. This information could be very useful in the development of Language Models for the recognition of continuous Speech in this context.…”
Section: Processing Of the Audio Datamentioning
confidence: 99%
See 1 more Smart Citation