2011
DOI: 10.1002/asi.21597
|View full text |Cite
|
Sign up to set email alerts
|

Term weighting based on document revision history

Abstract: In real-world information retrieval systems, the underlying document collection is rarely stable or definite. This work is focused on the study of signals extracted from the content of documents at different points in time for the purpose of weighting individual terms in a document. The basic idea behind our proposals is that terms that have existed for a longer time in a document should have a greater weight. We propose four term weighting functions that use each document's history to estimate a current term … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2014
2014
2020
2020

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 9 publications
(8 citation statements)
references
References 10 publications
0
8
0
Order By: Relevance
“…The aim is to extract the significant terms from a set of document versions which bring changed and novel information into an article. Research has been done on term weighting using the time impact on changes occurring either in the collection (Efron, 2010) or in an individual document (Nunes, Ribeiro, & David, 2011). Instead of directly taking into account the periods of time in which a term occurs throughout an article's revision history, the score of a term is calculated here by considering the joint probabilities of both the insertion and deletion events occurring in a set of document versions.…”
Section: Approach-ii: Temporal Sentence Score (Tss)mentioning
confidence: 99%
“…The aim is to extract the significant terms from a set of document versions which bring changed and novel information into an article. Research has been done on term weighting using the time impact on changes occurring either in the collection (Efron, 2010) or in an individual document (Nunes, Ribeiro, & David, 2011). Instead of directly taking into account the periods of time in which a term occurs throughout an article's revision history, the score of a term is calculated here by considering the joint probabilities of both the insertion and deletion events occurring in a set of document versions.…”
Section: Approach-ii: Temporal Sentence Score (Tss)mentioning
confidence: 99%
“…However, the proposed model was shown to be outperformed by the baselines for ambiguous queries which spans multiple topics such as the query "earthquake prediction" that include geology, natural disasters, etc. Following the same direction, Nunes et al [48] used the same document's revision history as a source of temporal evidence and propose several different term-weighting measures. The basic idea of the approach is to give higher weight values to terms that have existed for a longer time in a document, since its first version should be valued higher than a term that was introduced only in the latest revision made.…”
Section: Time At the Ranking Levelmentioning
confidence: 99%
“…Reference [8] (12) In this measure the impact of the other terms of document d has been considered, too.…”
Section: Term Weighting Based On Document Revision Histroymentioning
confidence: 99%
“…For example, the Wikipedia pages have different versions that these versions are created by different people to improve the content of the pages. Previous researches [1][2][3][4][5][6][7][8][9][10][11] have shown that investigating on these changes can improve the efficiency of information retrieval systems.…”
Section: Introductionmentioning
confidence: 99%