2003
DOI: 10.1016/s0306-4573(02)00021-3
|View full text |Cite
|
Sign up to set email alerts
|

An information-theoretic perspective of tf–idf measures

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
486
0
7

Year Published

2003
2003
2019
2019

Publication Types

Select...
6
3
1

Relationship

0
10

Authors

Journals

citations
Cited by 1,086 publications
(493 citation statements)
references
References 14 publications
0
486
0
7
Order By: Relevance
“…For documents, a Fisher kernel measures how much two members of a collection are similar taking into account a whole corpus as background information. We show that the Fisher kernel based on the DCM has a mathematical form related to the well-known TF-IDF representation for documents [1]. This demonstration is a new approach towards explaining why the TF-IDF heuristic is justified and why it is so successful experimentally.…”
Section: Introductionmentioning
confidence: 75%
“…For documents, a Fisher kernel measures how much two members of a collection are similar taking into account a whole corpus as background information. We show that the Fisher kernel based on the DCM has a mathematical form related to the well-known TF-IDF representation for documents [1]. This demonstration is a new approach towards explaining why the TF-IDF heuristic is justified and why it is so successful experimentally.…”
Section: Introductionmentioning
confidence: 75%
“…performansı artırmak için terimlere ağırlık atanmasıdır. Bu ağırlıklandırma işlemi ile bir terimin bir dokümandaki önemi belirtilmiş olur [5]. Bu nedenle, doküman temsilinin doğru ve etkili olmasında terim ağırlıklandırma önemli bir role sahiptir.…”
Section: İlgili çAlışmalarunclassified
“…In order to select the links in a graph that are most relevant based on the given start and destination nodes, we utilize an adapted variant of the TF/IDF [1] measure: PF/IRF. The PF/IRF measure reflects the importance of a predicate with respect to a resource in a dataset and is defined as follows:…”
Section: Domain Delineationmentioning
confidence: 99%