Improvement of HITS-based algorithms on web documents

Li, Longzhuang; Shang, Yi; Zhang, Wei

doi:10.1145/511511.511514

Cited by 20 publications

(24 citation statements)

References 2 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As for the page ranking, PageRank [11] , HITS [12], or other classical ranking algorithms can be used to rank the pages by the degree of relevance with the retrieval keywords, ensuring that users can get the information they need as quickly as possible.…”

Section: A the Organization Of Search Resultsmentioning

confidence: 99%

Detection and optimized disposal of near-duplicate pages

Qiu

Qian

2010

2010 2nd International Conference on Future Computer and Communication

View full text Add to dashboard Cite

Search engine is an important tool for users to access network information resources. However, a large number of duplicate and near-duplicate pages added user's burden. Currently, search engines only remove duplicate pages, but have not yet any effective strategies in detecting and disposing nearduplicate pages. This paper analyzed the existing algorithms to select an appropriate algorithm to detect near-duplicate pages,and optimized the disposing strategy to ensure that nearduplicate pages would not take up too much space in search results while being used effectively. These will allow users to retrieve needed information more easily.

show abstract

Section: A the Organization Of Search Resultsmentioning

confidence: 99%

Detection and optimized disposal of near-duplicate pages

Qiu

Qian

2010

2010 2nd International Conference on Future Computer and Communication

View full text Add to dashboard Cite

show abstract

“…Li et al [22] found that HITS is vulnerable to the "small-in-large-out" situation. Gyongyi et al describe a new algorithm, TrustRank, to combat Web spam [17].…”

Section: Related Workmentioning

confidence: 99%

“…However with the appearance of link farms, in which sites are densely interconnected, HITS is no longer robust [5,21,22,8]. 1 For example, the top 10 authorities generated by HITS for query weather are shown in Table 1.…”

Section: Introductionmentioning

confidence: 99%

Undue influence

Davison

2006

Proceedings of the 2006 ACM Symposium on Applied Computing

View full text Add to dashboard Cite

Link farm spam and replicated pages can greatly deteriorate linkbased ranking algorithms such as HITS. In order to identify and neutralize link farm spam and replicated pages, we look for sufficient material copied from one page to another. In particular, we focus on the use of "complete hyperlinks" to distinguish link targets by the anchor text used. We build and analyze the bipartite graph of documents and their complete hyperlinks to find pages that share anchor text and link targets. Link farms and replicated pages are identified in this process, permitting the influence of problematic links to be reduced in a weighted adjacency matrix. Experiments and user evaluations show significant improvement in the quality of results produced using HITS-like methods.

show abstract

“…Methods based on mutual reinforcement principle have been widely reported in literature especially in the domains of journal evaluation and more recently on Web search [4,14,17,21,13].…”

Section: Related Work Using Mutual Reinforcementmentioning

confidence: 99%

“…Google's pagerank of a particular Web-page is a measure of its standing based on its link structure [4]. In [17], modification of HITS by assigning a weight to each link based on textual similarities between pages has been found to perform better than the original HITS.…”

Section: Related Work Using Mutual Reinforcementmentioning

confidence: 99%

The story picturing engine

Joshi

Wang

2004

Proceedings of the 6th ACM SIGMM International Workshop on Multimedia Information Retrieval

View full text Add to dashboard Cite

In this paper, we present an approach towards automated story picturing based on mutual reinforcement principle. Story picturing refers to the process of illustrating a story with suitable pictures. In our approach, semantic keywords are extracted from the story text and an annotated image database is searched to form an initial picture pool. Thereafter, a novel image ranking scheme automatically determines the importance of each image. Both lexical annotations and visual content of an image play a role in determining its rank. Annotations are processed using the Wordnet to derive a lexical signature for each image. An integrated region based similarity is also calculated between each pair of images. An overall similarity measure is formed using lexical and visual features. In the end, a mutual reinforcement based rank is calculated for each image using the image similarity matrix. We also present a human behavior model based on a discrete state Markov process which captures the intuition for our technique. Experimental results have demonstrated the effectiveness of our scheme.

show abstract

Improvement of HITS-based algorithms on web documents

Cited by 20 publications

References 2 publications

Detection and optimized disposal of near-duplicate pages

Detection and optimized disposal of near-duplicate pages

Undue influence

The story picturing engine

Contact Info

Product

Resources

About