Using Unsupervised Paradigm Acquisition for Prefixes

Zeman, Daniel

doi:10.1007/978-3-642-04447-2_130

Cited by 7 publications

(3 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The logical choice for minimizing this problem would be to reduce the index by using some kind of pruning (Carmel et al, 2001) or term selection (Zeman, 2009) technique.…”

Section: Background and Related Workmentioning

confidence: 99%

On the feasibility of character n-grams pseudo-translation for Cross-Language Information Retrieval tasks

Vilares

Alonso

et al. 2016

Computer Speech & Language

View full text Add to dashboard Cite

The field of Cross-Language Information Retrieval relates techniques close to both the Machine Translation and Information Retrieval fields, although in a context involving characteristics of its own. The present study looks to widen our knowledge about the effectiveness and applicability to that field of nonclassical translation mechanisms that work at character n-gram level. For the purpose of this study, an n-gram based system of this type has been developed. This system requires only a bilingual machine-readable dictionary of n-grams, automatically generated from parallel corpora, which serves to translate queries previously n-grammed in the source language. n-Gramming is then used as an approximate string matching technique to perform monolingual text retrieval on the set of n-grammed documents in the target language.The tests for this work have been performed on CLEF collections for seven European languages, taking English as the target language. The performance attained, close to the upper baseline, confirms the validity of character n-gram based approaches for Cross Language Information Retrieval tasks, both for indexing-retrieval and translation purposes, these not being tied to a given implementation.

show abstract

“…The logical choice for minimizing this problem would be to reduce the index by using some kind of pruning (Carmel et al, 2001) or term selection (Zeman, 2009) technique.…”

Section: Background and Related Workmentioning

confidence: 99%

On the feasibility of character n-grams pseudo-translation for Cross-Language Information Retrieval tasks

Vilares

Alonso

et al. 2016

Computer Speech & Language

View full text Add to dashboard Cite

show abstract

“…Computationally, inflection classes introduce nonuniformity across paradigms and must be handled in one way or another in an automatic morphology learning system. Previous work has opted to explicitly learn inflection classes (Goldsmith and O'Brien 2006) or collapse them in some way (Chan 2006, Hammarström 2009, Monson 2009, Zeman 2009).…”

Section: Inflection Classesmentioning

confidence: 99%

Morphological Paradigms: Computational Structure and Unsupervised Learning

Jackson

2015

Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Student Rese

View full text Add to dashboard Cite

This thesis explores the computational structure of morphological paradigms from the perspective of unsupervised learning. Three topics are studied: (i) stem identification, (ii) paradigmatic similarity, and (iii) paradigm induction. All the three topics progress in terms of the scope of data in question. The first and second topics explore structure when morphological paradigms are given, first within a paradigm and then across paradigms. The third topic asks where morphological paradigms come from in the first place, and explores strategies of paradigm induction from child-directed speech. This research is of interest to linguists and natural language processing researchers, for both theoretical questions and applied areas.

show abstract

“…Zeman [19], (a revised version of [15]) propose methods to include prefix identification. Words are reversed to detect prefixes, using rules over all possible prefixes simply yields the prefix candidates.…”

Section: Morpho Challenge 2008mentioning

confidence: 99%