Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the ACL - ACL '06 2006
DOI: 10.3115/1220175.1220251
|View full text |Cite
|
Sign up to set email alerts
|

A comparison of document, sentence, and term event spaces

Abstract: The trend in information retrieval systems is from document to sub-document retrieval, such as sentences in a summarization system and words or phrases in question-answering system. Despite this trend, systems continue to model language at a document level using the inverse document frequency (IDF). In this paper, we compare and contrast IDF with inverse sentence frequency (ISF) and inverse term frequency (ITF). A direct comparison reveals that all language models are highly correlated; however, the average IS… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
19
0
3

Year Published

2008
2008
2022
2022

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 20 publications
(22 citation statements)
references
References 13 publications
0
19
0
3
Order By: Relevance
“…PCP2P relies on the fact that term frequencies follow a Zipf distribution. Although it is accepted that document collections follow Zipf distribution, different document collections exhibit different distribution skews [1]. For example, the RCV1 collection used in our experiments has a skew of 0.55, while values reported in the literature for other text collections are around 1.0 [1,17].…”
Section: Evaluation Resultsmentioning
confidence: 90%
See 2 more Smart Citations
“…PCP2P relies on the fact that term frequencies follow a Zipf distribution. Although it is accepted that document collections follow Zipf distribution, different document collections exhibit different distribution skews [1]. For example, the RCV1 collection used in our experiments has a skew of 0.55, while values reported in the literature for other text collections are around 1.0 [1,17].…”
Section: Evaluation Resultsmentioning
confidence: 90%
“…Although it is accepted that document collections follow Zipf distribution, different document collections exhibit different distribution skews [1]. For example, the RCV1 collection used in our experiments has a skew of 0.55, while values reported in the literature for other text collections are around 1.0 [1,17]. To evaluate the influence of the skew on PCP2P, we used SYNTH collections generated with different Zipf skew factors, between 0.5 and 1.1.…”
Section: Evaluation Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…For this we assume, as is common in IR [22,23], that the dictionary size of each peer follows Heap's law [23]. We denote the length of a document, i.e., the number of words it consists of, as len(d), and the length of a document collection len(DC(p)) = d∈DC(p) len(d).…”
Section: Cost Comparisonmentioning
confidence: 99%
“…Blake [13] compares different language models and speculates on using ISF for the systems aiming at sentence extraction and inverse term frequency (ITF) for the systems identifying terms as their smallest compositional unit. Thus for the TS systems based on unigrams ITF could be a reliable method to select the sentences for the final summary.…”
Section: Inverse Term and Sentence Frequenciesmentioning
confidence: 99%