Learning argument/adjunct distinction for Basque

Aldezabal, Izaskun; Aranzabe, Maxux; Gojenola, Koldo; Sarasola, Kepa; Atutxa, Aitziber

doi:10.3115/1118627.1118633

Cited by 8 publications

(8 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We reuse subcategorization patterns (Aldezabal et al 2002) taken from a corpus created with texts of the Euskaldunon Egunkaria daily newspaper. 24 The patterns provide the following data: the verb lemma, the grammatical case of the subject, the list of postpositions for the other constituents, the transitivity of the verb corresponding to this combination and the frequency of the pattern in the corpus.…”

Section: Subcategorization Patternsmentioning

confidence: 99%

Matxin, an open-source rule-based machine translation system for Basque

Mayor

Alegria

Ilarraza

et al. 2011

Machine Translation

Self Cite

View full text Add to dashboard Cite

We present the first publicly available machine translation (MT) system for Basque. The fact that Basque is both a morphologically rich and less-resourced language makes the use of statistical approaches difficult, and raises the need to develop a rule-based architecture which can be combined in the future with statistical techniques. The MT architecture proposed reuses several open-source tools and is based on a unique XML format to facilitate the flow between the different modules, which eases the interaction among different developers of tools and resources. The result is the rule-based Matxin MT system, an open-source toolkit, whose first implementation translates from Spanish to Basque. We have performed innovative work on the following tasks: construction of a dependency analyser for Spanish, use of rich linguistic information to translate prepositions and syntactic functions (such as subject and object markers), construction of an efficient module for verbal chunk transfer, and design and implementation of modules for ordering words and phrases, independently of the source language.

show abstract

Section: Subcategorization Patternsmentioning

confidence: 99%

Matxin, an open-source rule-based machine translation system for Basque

Mayor

Alegria

Ilarraza

et al. 2011

Machine Translation

Self Cite

View full text Add to dashboard Cite

show abstract

“…Previous work focusing specifically on the automatic complement-adjunct distinction varies from generating candidate complements and utilizing statistical filtering on the candidates in order to filter out adjuncts (Aldezabal, Aranzabe, Atutxa, Gojenola and Sarasola 2002) to making use of memory-based learning methods (Buchholz 1998) and decision trees (Merlo and Leybold 2001). Buchholz (1998) describes a set of experiments performed on the part-of-speech tagged and phrase structured part of the Wall Street journal.…”

Section: Previous Workmentioning

confidence: 99%

“…Buchholz (1998) describes a set of experiments performed on the part-of-speech tagged and phrase structured part of the Wall Street journal. Aldezabal et al (2002) work on Basque. The last feature of the vector corresponds to the target class.…”

Section: Previous Workmentioning

confidence: 99%

See 1 more Smart Citation

Learning verb complements for Modern Greek: balancing the noisy dataset

et al. 2006

View full text Add to dashboard Cite

Attempting to automatically learn to identify verb complements from natural language corpora without the help of sophisticated linguistic resources like grammars, parsers or treebanks leads to a significant amount of noise in the data. In machine learning terms, where learning from examples is performed using class-labelled feature-value vectors, noise leads to an imbalanced set of vectors: assuming that the class label takes two values (in this work complement/non-complement), one class (complements) is heavily underrepresented in the data in comparison to the other. To overcome the drop in accuracy when predicting instances of the rare class due to this disproportion, we balance the learning data by applying onesided sampling to the training corpus and thus by reducing the number of non-complement instances. This approach has been used in the past in several domains (image processing, medicine, etc) but not in natural language processing. For identifying the examples that are safe to remove, we use the value difference metric, which proves to be more suitable for nominal attributes like the ones this work deals with, unlike the Euclidean distance, which has been used traditionally in one-sided sampling. We experiment with different learning algorithms which have been widely used and their performance is well known to the machine learning community: Bayesian learners, instance-based learners and decision trees. Additionally we present and test a variation of Bayesian belief networks, the COr-BBN (Class-oriented Bayesian belief network). The performance improves up to 22% after balancing the dataset, reaching 73.7% f-measure for the complement class, having made use only a phrase chunker and basic morphological information for preprocessing.

show abstract

“…They incorporate semantic verb class, preposition and noun cluster information and reach an accuracy of 86.5% with a training set of 3692 and a test set of 400 instances. Aldezabal et al (2002) work on Basque. They apply mutual information and Fisher's Exact Test to verb-case pairs (a case is any type of argument) which were obtained from a partially parsed newspaper corpus of 1.3 million words.…”

Section: Introductionmentioning

confidence: 99%

Learning Greek verb complements

Kermanidis

Μaragoudakis

Fakotakis

et al. 2004

Proceedings of the 20th International Conference on Computational Linguistics - COLING '04

View full text Add to dashboard Cite

Imbalanced training sets, where one class is heavily underrepresented compared to the others, have a bad effect on the classification of rare class instances. We apply One-sided Sampling for the first time to a lexical acquisition task (learning verb complements from Modern Greek corpora) to remove redundant and misleading training examples of verb nondependents and thereby balance our training set. We experiment with well-known learning algorithms to classify new examples. Performance improves up to 22% in recall and 15% in precision after balancing the dataset 1 .

show abstract

Learning argument/adjunct distinction for Basque

Cited by 8 publications

References 13 publications

Matxin, an open-source rule-based machine translation system for Basque

Matxin, an open-source rule-based machine translation system for Basque

Learning verb complements for Modern Greek: balancing the noisy dataset

Learning Greek verb complements

Contact Info

Product

Resources

About