Markovian analysis for automatic new topic identification in search engine transaction logs

Özmutlu, H. Cenk

doi:10.1002/asmb.758

Cited by 5 publications

(14 citation statements)

References 74 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The large majority of the studies on query modification used search logs of search engines for textual content. Probably the most studied search logs are those of the Excite search engine (Bozzon et al, 2007; Lau & Horvitz, 1999; Özmutlu, 2009; Rieh & Xie, 2006; Whittle et al, 2007). Other studies analyzed logs of Dogpile (Jansen et al, 2009), Tumba (Costa & Seco, 2008), AOL (Huang & Efthimiadis, 2009), Fast (Özmutlu, 2009), and Yahoo!…”

Section: Related Workmentioning

confidence: 99%

Semantic search log analysis: A method and a study on professional image search

Hollink

Tsikrika

Vries

2011

J. Am. Soc. Inf. Sci.

View full text Add to dashboard Cite

Existing methods for automatically analyzing search logs describe search behavior on the basis of purely syntactic differences (overlapping terms) between queries. Although these statistics at a syntactic level provide valuable insights into the complexity and successfulness of search interactions, they offer a limited interpretation of the observed searching behavior, as they do not consider the semantics of users' queries. Recently, large amounts of semantic information have become publicly available in the form of linked data. In this paper we propose a method to exploit this information to enrich search queries with linked data entities so as to determine the semantic types of the queries and the relations between queries that are consecutively entered in a search session.This work provides also an in-depth analysis of the search logs of the commercial picture portal of a European news agency, which offers access to photographic images to professional users. Compared to previous image search log analyses, in particular those of professional users, we consider a much larger dataset. We analyze the logs both in the more traditional syntactic way and using the newly proposed semantic approach, and compare the results. Our findings show the benefits of using semantics for search log analysis: the identified types of query modifications cannot be appropriately analyzed with a purely statistical approach that only considers term overlap, since queries related in the most frequent ways do not usually share terms. We discuss implications of our findings for improving log analysis, image collection management, and search engine design.

show abstract

Section: Related Workmentioning

confidence: 99%

Semantic search log analysis: A method and a study on professional image search

Hollink

Tsikrika

Vries

2011

J. Am. Soc. Inf. Sci.

View full text Add to dashboard Cite

show abstract

“…Hence, DempsterShafer was tested under five different configurations detailed in [50,53]. According to Özmutlu and Çavdur [50] the parameters obtained for one particular dataset are not necessarily the most successful ones to segment that dataset and the results obtained by this author confirm this claim.…”

Section: Resultsmentioning

confidence: 70%

“…Since then, they and their colleagues have revisited the Dempster-Shafer method [53] and studied the feasibility of additional ones: neural networks [51,52], multiple linear regression [48,55], Monte-Carlo simulation [49] and conditional probabilities [54].…”

Section: Machine-learning Methods To Combine Temporal and Lexical Cluesmentioning

confidence: 99%

“…New, Reformulation, Specialization, or Generalization). To apply this technique several probabilities and parameters must be provided and, thus, the settings described in [50,53] were used for the experiments. Özmutlu and Buyuk [49] further elaborated the idea of using conditional probabilities by means of Monte-Carlo simulation.…”

Section: Methods Relying On Both Temporal and Lexical Cluesmentioning

confidence: 99%

See 1 more Smart Citation

A survey on session detection methods in query logs and a proposal for future evaluation

Gayo-Avello

2009

Information Sciences

View full text Add to dashboard Cite

Search engine logs provide a highly detailed insight of users' interactions. Hence, they are both extremely useful and sensitive. The datasets publicly available to scholars are, unfortunately, too few, too dated and too small. There are few because search engine companies are reluctant to release such data; they are dated because they were collected in late 1990s or early 2000s; and they are small because they comprise data for at most one day and just a few hundreds of thousands of users. Even worse, the large query log disclosed by AOL in 2006 caused more harm than good because of a big privacy flaw. In this paper the author provides an overall view of the possible applications of query logs, the privacy concerns researchers must face when working on such datasets, and several ways in which query logs can be easily sanitized. One of such measures consists of segmenting the logs into short topical sessions. Therefore, the author offers a comprehensive survey of session detection methods, as well as a thorough description of a new evaluation framework with performance results for each of the different methods. Additionally, a new, simple, but outperforming session detection method is proposed. It is a heuristic-based technique which works on the basis of a geometric interpretation of both the time gap between queries and the similarity between them in order to flag a topic shift.

show abstract

“…For the runs with T D 7 factors, we simply omitted the estimated topics with the smallest estimated probabilities from Equation (2) in the calculations in Equations (3)- (6). We used an enumerative search to identify the best match between estimated and true topics in Equations (3) and (4). In our analysis, we found that all results for the KL divergence responses (3) and (5) are the same as for the RMS distance responses (4) and (6).…”

Section: Numerical Studymentioning

confidence: 90%

Pareto charting using multifield freestyle text data applied to Toyota Camry user reviews

Allen

Xiong

2012

Appl Stoch Models Bus & Ind

View full text Add to dashboard Cite

This article proposes a method for Pareto charting that is based on unsupervised, freestyle text such as customer complaint, rework, scrap, or maintenance event descriptions. The proposed procedure is based on a slight extension of the latent Dirichlet allocation method to form multifield latent Dirichlet allocation. The extension is the usage of field‐specific dictionaries for multifield databases and changes to recommended default prior settings. We use a numerical study to motivate the prior setting selection. A real‐world case study associated with user reviews of Toyota Camry vehicles is used to illustrate the practical value of the proposed methods. The results indicate that only 4% of the words written by Consumer Reports reviewers from the last 10 years relate to the widely publicized unintended acceleration issue. Copyright © 2012 John Wiley & Sons, Ltd.

show abstract

Markovian analysis for automatic new topic identification in search engine transaction logs

Cited by 5 publications

References 74 publications

Semantic search log analysis: A method and a study on professional image search

Semantic search log analysis: A method and a study on professional image search

A survey on session detection methods in query logs and a proposal for future evaluation

Pareto charting using multifield freestyle text data applied to Toyota Camry user reviews

Contact Info

Product

Resources

About