When a Web user's underlying information need is not clearly specified from the initial query, an effective approach is to diversify the results retrieved for this query. In this paper, we introduce a novel probabilistic framework for Web search result diversification, which explicitly accounts for the various aspects associated to an underspecified query. In particular, we diversify a document ranking by estimating how well a given document satisfies each uncovered aspect and the extent to which different aspects are satisfied by the ranking as a whole. We thoroughly evaluate our framework in the context of the diversity task of the TREC 2009 Web track. Moreover, we exploit query reformulations provided by three major Web search engines (WSEs) as a means to uncover different query aspects. The results attest the effectiveness of our framework when compared to state-of-the-art diversification approaches in the literature. Additionally, by simulating an upper-bound query reformulation mechanism from official TREC data, we draw useful insights regarding the effectiveness of the query reformulations generated by the different WSEs in promoting diversity.
Venue recommendation systems aim to effectively rank a list of interesting venues users should visit based on their historical feedback (e.g. checkins). Such systems are increasingly deployed by Locationbased Social Networks (LBSNs) such as Foursquare and Yelp to enhance their usefulness to users. Recently, various RNN architectures have been proposed to incorporate contextual information associated with the users' sequence of checkins (e.g. time of the day, location of venues) to effectively capture the users' dynamic preferences. However, these architectures assume that different types of contexts have an identical impact on the users' preferences, which may not hold in practice. For example, an ordinary contextsuch as the time of the day-reflects the user's current contextual preferences, whereas a transition context-such as a time interval from their last visited venue-indicates a transition effect from past behaviour to future behaviour. To address these challenges, we propose a novel Contextual Attention Recurrent Architecture (CARA) that leverages both sequences of feedback and contextual information associated with the sequences to capture the users' dynamic preferences. Our proposed recurrent architecture consists of two types of gating mechanisms, namely 1) a contextual attention gate that controls the influence of the ordinary context on the users' contextual preferences and 2) a time-and geo-based gate that controls the influence of the hidden state from the previous checkin based on the transition context. Thorough experiments on three large checkin and rating datasets from commercial LBSNs demonstrate the effectiveness of our proposed CARA architecture by significantly outperforming many state-of-the-art RNN architectures and factorisation approaches.
Dynamic pruning strategies permit efficient retrieval by not fully scoring all postings of the documents matching a query -without degrading the retrieval effectiveness of the topranked results. However, the amount of pruning achievable for a query can vary, resulting in queries taking different amounts of time to execute. Knowing in advance the execution time of queries would permit the exploitation of online algorithms to schedule queries across replicated servers in order to minimise the average query waiting and completion times. In this work, we investigate the impact of dynamic pruning strategies on query response times, and propose a framework for predicting the efficiency of a query. Within this framework, we analyse the accuracy of several query efficiency predictors across 10,000 queries submitted to in-memory inverted indices of a 50-million-document Web crawl. Our results show that combining multiple efficiency predictors with regression can accurately predict the response time of a query before it is executed. Moreover, using the efficiency predictors to facilitate online scheduling algorithms can result in a 22% reduction in the mean waiting time experienced by queries before execution, and a 7% reduction in the mean completion time experienced by users.
Abstract. Terrier is a modular platform for the rapid development of large-scale Information Retrieval (IR) applications. It can index various document collections, including TREC and Web collections. Terrier also offers a range of document weighting and query expansion models, based on the Divergence From Randomness framework. It has been successfully used for ad-hoc retrieval, cross-language retrieval, Web IR and intranet search, in a centralised or distributed setting.
Venue recommendation is an important application for LocationBased Social Networks (LBSNs), such as Yelp, and has been extensively studied in recent years. Matrix Factorisation (MF) is a popular Collaborative Filtering (CF) technique that can suggest relevant venues to users based on an assumption that similar users are likely to visit similar venues. In recent years, deep neural networks have been successfully applied to tasks such as speech recognition, computer vision and natural language processing. Building upon this momentum, various approaches for recommendation have been proposed in the literature to enhance the e ectiveness of MF-based approaches by exploiting neural network models such as: word embeddings to incorporate auxiliary information (e.g. textual content of comments); and Recurrent Neural Networks (RNN) to capture sequential properties of observed user-venue interactions. However, such approaches rely on the traditional inner product of the latent factors of users and venues to capture the concept of collaborative ltering, which may not be su cient to capture the complex structure of user-venue interactions. In this paper, we propose a Deep Recurrent Collaborative Filtering framework (DRCF) with a pairwise ranking function that aims to capture user-venue interactions in a CF manner from sequences of observed feedback by leveraging Multi-Layer Perception and Recurrent Neural Network architectures. Our proposed framework consists of two components: namely Generalised Recurrent Matrix Factorisation (GRMF) and Multi-Level Recurrent Perceptron (MLRP) models. In particular, GRMF and MLRP learn to model complex structures of user-venue interactions using element-wise and dot products as well as the concatenation of latent factors. In addition, we propose a novel sequence-based negative sampling approach that accounts for the sequential properties of observed feedback and geographical location of venues to enhance the quality of venue suggestions, as well as alleviate the cold-start users problem. Experiments on three large checkin and rating datasets show the e ectiveness of our proposed framework by outperforming various state-of-the-art approaches.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.