Proceedings of the Fourteenth ACM Conference on Hypertext and Hypermedia - HYPERTEXT '03 2003
DOI: 10.1145/900095.900096
|View full text |Cite
|
Sign up to set email alerts
|

Refinement of TF-IDF schemes for web pages using their hyperlinked neighboring pages

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
24
0
3

Year Published

2005
2005
2017
2017

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 22 publications
(27 citation statements)
references
References 0 publications
0
24
0
3
Order By: Relevance
“…The cosine TFIDF weighting scheme is widely used in IR to determine the similarity between two documents [2,[29][30][31]. However, its precision is not very high [34,35]. In this paper, we use it as a rough metric of similarity for the web pages in CW dataset.…”
Section: About the Cosine Tfidf Metricmentioning
confidence: 99%
“…The cosine TFIDF weighting scheme is widely used in IR to determine the similarity between two documents [2,[29][30][31]. However, its precision is not very high [34,35]. In this paper, we use it as a rough metric of similarity for the web pages in CW dataset.…”
Section: About the Cosine Tfidf Metricmentioning
confidence: 99%
“…We also plan to investigate additional IDF evaluation techniques, such as estimation based on limited crawls of hyperlinked neighboring pages [26]. Scraping Google for IDF values is not a viable long-term strategy, and at the very least we have not considered multi-lingual support in our prototype.…”
Section: Idfmentioning
confidence: 99%
“…As we have discussed earlier, FileRank is used by Eureka to scale the IR rankings of search results and bias them toward the more "important" files. Our approach is inspired by Hypertext [6,7] and Webbased [2,5] techniques, where the importance of a document is determined by the number and type of links that reach it. More formally, our technique performs a random walk over the semantic file graph where the probability of traversing a link is proportional to its weight.…”
Section: Filerank Computationmentioning
confidence: 99%
“…This paper describes Eureka, a file system search engine that employs a "structured" view of the world in order to improve the effectiveness of file searches. Eureka is inspired by research in the Web [2,5] and Hypertext [6,7] communities, which has shown that the overall structure in a collection of hyper-linked documents can play an important role in determining the importance and ranking of different documents. Based on this intuition, we develop a framework for inferring semantic links in a file system, thus transforming a "flat" collection of files in a graph of hyper-linked documents, and quantifying the importance of each file based on the characteristics of this semantic graph.…”
Section: Introductionmentioning
confidence: 99%