2019
DOI: 10.1080/08839514.2019.1583447
|View full text |Cite
|
Sign up to set email alerts
|

An Algorithmic Scheme for Statistical Thesaurus Construction in a Morphologically Rich Language

Abstract: Corpus-based thesaurus construction for Morphologically Rich Languages (MRL) is a complex task, due to the morphological variability of MRL. In this paper we explore alternative term representations, complemented by clustering of morphological variants. We introduce a generic algorithmic scheme for thesaurus construction in MRL, and demonstrate the empirical benefit of our methodology for a Hebrew thesaurus.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
2

Relationship

3
3

Authors

Journals

citations
Cited by 6 publications
(5 citation statements)
references
References 29 publications
0
5
0
Order By: Relevance
“…We plan to investigate additional aggregation methods and explore the impact of the individual models on the combined system to improve our system results. We also plan to try our system on other languages of different families, such as Semitic languages (Liebeskind and Liebeskind, 2020) and use LSC models to construct diachronic thesaurus, which bridges the lexical gap between modern and ancient language (Zohar et al, 2013;Liebeskind and Dagan, 2015;Liebeskind et al, 2016;Liebeskind et al, 2019).…”
Section: Discussionmentioning
confidence: 99%
“…We plan to investigate additional aggregation methods and explore the impact of the individual models on the combined system to improve our system results. We also plan to try our system on other languages of different families, such as Semitic languages (Liebeskind and Liebeskind, 2020) and use LSC models to construct diachronic thesaurus, which bridges the lexical gap between modern and ancient language (Zohar et al, 2013;Liebeskind and Dagan, 2015;Liebeskind et al, 2016;Liebeskind et al, 2019).…”
Section: Discussionmentioning
confidence: 99%
“…Previous work reported that the available modern Hebrew morphological analyzing tools have poor performance on the Responsa corpus. For example, whereas the accuracy of a state-ofthe-art modern Hebrew tagger on modern Hebrew text was over 90%, on the Responsa corpus it was only about 60% [Liebeskind et al, 2012]. Therefore, HaCohen-Kerner et al [2010] used the raw text and we followed the same approach.…”
Section: Conventional Machine-learning Methodsmentioning
confidence: 99%
“…In the evaluation stage, they classified each related term into lemma based groups. Later, Liebeskind et al [2012] proposed a schematic methodology for generating a co-occurrencebased thesaurus for the Responsa project. They investigated three options for term representation: surface form, lemma, and multiple lemmas, supplemented with the clustering of term variants.…”
Section: The Responsa Corpus and Diachronic Tasksmentioning
confidence: 99%
“…Previously, most of the methods for lexical semantic change detection built co-occurrence matrices [70,84,109]. While in some cases, high-dimensional sparse matrices were used, in other cases, the dimensions of the matrices were reduced mainly using singular value decomposition (SVD) [156].…”
Section: Word Embeddingsmentioning
confidence: 99%