David A. Grossman scite author profile

We review a query log of hundreds of millions of queries that constitute the total query traffic for an entire week of a generalpurpose commercial web search service. Previously, query logs have been studied from a single, cumulative view. In contrast, our analysis shows changes in popularity and uniqueness of topically categorized queries across the hours of the day. We examine query traffic on an hourly basis by matching it against lists of queries that have been topically pre-categorized by human editors. This represents 13% of the query traffic. We show that query traffic from particular topical categories differs both from the query stream as a whole and from other categories. This analysis provides valuable insight for improving retrieval effectiveness and efficiency. It is also relevant to the development of enhanced query disambiguation, routing, and caching algorithms.

show abstract

Building a test collection for complex document information processing

Lewis

et al. 2006

View full text Add to dashboard Cite

New Terminology and Clarifications for Diffserv

Grossman¹

2002

156

View full text Add to dashboard Cite

Temporal analysis of a very large topically categorized Web query log

Beitzel

Jensen

Chowdhury

et al. 2006

J. Am. Soc. Inf. Sci.

View full text Add to dashboard Cite

Web queries that constituted the total query traffic for a 6-month period of a general-purpose commercial Web search service. Previously, query logs were studied from a single, cumulative view. In contrast, this study builds on the authors' previous work, which showed changes in popularity and uniqueness of topically categorized queries across the hours in a day. To further their analysis, they examine query traffic on a daily, weekly, and monthly basis by matching it against lists of queries that have been topically precategorized by human editors. These lists represent 13% of the query traffic. They show that query traffic from particular topical categories differs both from the query stream as a whole and from other categories. Additionally, they show that certain categories of queries trend differently over varying periods. The authors key contribution is twofold: They outline a method for studying both the static and topical properties of a very large query log over varying periods, and they identify and examine topical trends that may provide valuable insight for improving both retrieval effectiveness and efficiency.

show abstract

Fusion of effective retrieval strategies in the same information retrieval system

Beitzel

Jensen

Chowdhury

et al. 2004

J. Am. Soc. Inf. Sci.

View full text Add to dashboard Cite

Prior efforts have shown that under certain situations retrieval effectiveness may be improved via the use of data fusion techniques. Although these improvements have been observed from the fusion of result sets from several distinct information retrieval systems, it has often been thought that fusing different document retrieval strategies in a single information retrieval system will lead to similar improvements. In this study, we show that this is not the case. We hold constant systemic differences such as parsing, stemming, phrase processing, and relevance feedback, and fuse result sets generated from highly effective retrieval strategies in the same information retrieval system. From this, we show that data fusion of highly effective retrieval strategies alone shows little or no improvement in retrieval effectiveness. Furthermore, we present a detailed analysis of the performance of modern data fusion approaches, and demonstrate the reasons why they do not perform well when applied to this problem. Detailed results and analyses are included to support our conclusions.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

David A. Grossman

Hourly analysis of a very large topically categorized web query log

Building a test collection for complex document information processing

New Terminology and Clarifications for Diffserv

Temporal analysis of a very large topically categorized Web query log

Fusion of effective retrieval strategies in the same information retrieval system

Contact Info

Product

Resources

About