This article reports on the development of a novel method for the analysis of Web logs. The method uses techniques that look for similarities between queries and identify sequences of "query transformation". It allows sequences of query transformations to be represented as graphical networks, thereby giving a richer view of search behavior than is possible with the usual sequential descriptions. We also perform a basic analysis to study the correlations between observed transformation codes, with results that appear to show evidence of behavior habits. The method was developed using transaction logs from the Excite search engine to provide a tool for an ongoing research project that is endeavoring to develop a greater understanding of Web-based searching by the general public.
IntroductionThe research reported here is part of a larger experimental project designed to develop an understanding of howand how effectively-the general public search for information on the Web. The long-term aim of this study is to build an evidence-based model of effective searching that should inform the design of training and intelligent adaptive search interfaces. The evidence involved will include audio transcriptions, cognitive style scores, background questionnaire results, and search appraisal scores from our volunteer subjects as well as an analysis of their query transformation patterns. Query transformations are defined here as the linguistic and conceptual strategic changes searchers make as they repeatedly reformulate their search queries in response to ongoing failure and success. The rationale of the current project is based on the hypothesis that by studying query transformations in relation to other data, we can gain insights into what are the essential generic strategies that may result in more, and less, effective Web searching.In this article, we focus on the development of methods for the extraction and analysis of query transformation types and, to this end, an analysis has been conducted using transaction logs from the Excite search engine dating from 2001. Although these are not current, they represent a typical record of queries from that time period with many of the syntactic changes that will be of interest to us in our empirical studies. They are, moreover, well known and extensively studied (Spink, Wolfram, Jansen, & Saracevic, 2001). Although the lack of context places a limit on their usefulness, search logs do provide abundant data relating to syntactic changes and form an excellent training ground for the development of our analytical methods. The resulting analysis can be applied to our own empirical search data and used in conjunction with qualitative results to aid the development of our model.
Related WorkA significant body of research into Web searching now exists (Spink & Jansen, 2004). The motivation for such studies includes the exploitation of generic knowledge of effective searching to provide better training and the development of intelligent search engines that will proactively assist the sear...