Conglei Yao scite author profile

The goal of result diversification is to maximize the coverage of query subtopics while minimizing the redundancy in the search results. Intuitively, it is more desirable for a diversification system to cover independent subtopics since it would retrieve sets of non-overlapped relevant documents, which leads to less redundancy in the search results. Unfortunately, existing diversification methods assume that query subtopics are independent and ignore their relations in the diversification process. To overcome this limitation, we propose to exploit concept hierarchies to extract query subtopics and infer their relations. We then apply axiomatic approaches to derive a structural diversification method that can leverage the subtopic relations in result diversification. Experimental results over an enterprise collection show that the relations among query subtopics are useful to improve the diversification performance.

show abstract

Focused crawling using navigational rank

Feng¹,

Zhang

Xiong

et al. 2010

View full text Add to dashboard Cite

The goal of focused crawling is to use limited resources to effectively discover web pages related to a specific topic rather than downloading all accessible web documents. The major challenge in focused crawling is how to effectively determine each hyperlink's capability of leading to target pages. To compute this capability, we 1 present a novel approach, called Navigational Rank (NR). In general, NR is a kind of two-step and two-direction credit propagation approach. Compared to existing methods, NR mainly has three advantages. First, NR is dynamically updated during the crawling progress, which can adapt to different website structures very well. Second, when the crawling seed is far away from the target pages, and the target pages only constitute a small portion of the whole website, NR shows a significant performance advantage. Third, NR computes each link's capability of leading to target pages by considering both the target and non-target pages it leads to. This global knowledge causes a better performance than only using target pages. We have performed extensive experiments for performance evaluation of the proposed approach using two groups of large-scale, real-world datasets from two different domains. The experimental results show that our approach is domain-independent and significantly outperforms the state-of-arts.

show abstract

Search result diversification for enterprise data

Zheng

Fang

Yao³

et al. 2011

View full text Add to dashboard Cite

Leveraging integrated information to extract query subtopics for search result diversification

et al. 2013

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Conglei Yao

Acronym extraction and disambiguation in large-scale organizational web pages

Exploiting concept hierarchy for result diversification

Focused crawling using navigational rank

Search result diversification for enterprise data

Leveraging integrated information to extract query subtopics for search result diversification

Contact Info

Product

Resources

About