2010
DOI: 10.1007/978-3-642-15760-8_13
|View full text |Cite
|
Sign up to set email alerts
|

Comparison of Different Lemmatization Approaches through the Means of Information Retrieval Performance

Abstract: Abstract. This paper presents a quantitative performance analysis of two different approaches to the lemmatization of the Czech text data. The first one is based on manually prepared dictionary of lemmas and set of derivation rules while the second one is based on automatic inference of the dictionary and the rules from training data. The comparison is done by evaluating the mean Generalized Average Precision (mGAP) measure of the lemmatized documents and search queries in the set of information retrieval (IR)… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
20
0

Year Published

2011
2011
2017
2017

Publication Types

Select...
7

Relationship

3
4

Authors

Journals

citations
Cited by 29 publications
(21 citation statements)
references
References 6 publications
0
20
0
Order By: Relevance
“…The retrieval performance of this IR model can differ for various levels of interpolation, therefore the λ parameter was set according to the experiments presented in [5] to the best results yielding value -λ = 0.1.…”
Section: Query Likelihood Modelmentioning
confidence: 99%
“…The retrieval performance of this IR model can differ for various levels of interpolation, therefore the λ parameter was set according to the experiments presented in [5] to the best results yielding value -λ = 0.1.…”
Section: Query Likelihood Modelmentioning
confidence: 99%
“…As a result of these experiments the automatic text lemmatization is also applied in our work. The lemmatization module uses a lemmatizer described in the work [18]. The lemmatizer is automatically created from the data containing the pairs full word form -base word form.…”
Section: System For Acquisition and Storing Datamentioning
confidence: 99%
“…The lemmatizer is automatically created from the data containing the pairs full word form -base word form. A lemmatizer created in this way has been shown to be fully sufficient in the task of information retrieval [18].…”
Section: System For Acquisition and Storing Datamentioning
confidence: 99%
“…These methods were selected due to the good results in our information retrieval experiments [2], since we had no experience with the topic identification task so far.…”
Section: Identification Algorithmsmentioning
confidence: 99%
“…The appropriate keywords from the first-tier of the tree would then be politics & diplomacy, economy and health. 2 The first three lines of the Table 2 thus describe the language models that were trained using the articles published between January 1st, 2009 and July 17th, 2010 and are labeled with any keyword that comes from the subtree with the headword politics & diplomacy, politics & diplomacy and economy, and politics & diplomacy, economy and health. The results for these topicspecific LMs are compared with the models that are trained from all the articles that were published in the defined period just prior the broadcast day (lines 4 to 6).…”
Section: Language Modeling and Asr Experimentsmentioning
confidence: 99%