ABSTRACT. A keyword query is the representation of the information need of a user, and is the result of a complex cognitive process which often results in under-specification. We propose an unsupervised method namely Latent Concept Modeling (LCM) for mining and modeling latent search concepts in order to recreate the conceptual view of the original information need. We use Latent Dirichlet Allocation (LDA) to exhibit highly-specific query-related topics from pseudo-relevant feedback documents. We define these topics as the latent concepts of the user query. We perform a thorough evaluation of our approach over two large ad-hoc TREC collections. Our findings reveal that the proposed method accurately models latent concepts, while being very effective in a query expansion retrieval setting.RÉSUMÉ. Une requête est la représentation du besoin d'information d'un utilisateur, et est le résultat d'un processus cognitif complexe qui mène souvent à un mauvais choix de mots-clés. Nous proposons une méthode non supervisée pour la modélisation de concepts implicites d'une requête, dans le but de recréer la représentation conceptuelle du besoin d'information initial. Nous utilisons l'allocation de Dirichlet latente (LDA) pour détecter les concepts implicites de la requête en utilisant des documents pseudo-pertinents. Nous évaluons cette méthode en profondeur en utilisant deux collections de test de TREC. Nous trouvons notamment que notre approche permet de modéliser précisément les concepts implicites de la requête, tout en obtenant de bonnes performances dans le cadre d'une recherche de documents.
Modern Information Retrieval (IR) systems have become more and more complex, involving a large number of parameters. For example, a system may choose from a set of possible retrieval models (BM25, language model, etc.), or various query expansion parameters, whose values greatly in uence the overall retrieval effectiveness. Traditionally, these parameters are set at a system level based on training queries, and the same parameters are then used for di erent queries. We observe that it may not be easy to set all these parameters separately, since they can be dependent. In addition, a global setting for all queries may not best t all individual queries with di erent characteristics. The parameters should be set according to these characteristics. In this article, we propose a novel approach to tackle this problem by dealing with the entire system con gurations (i.e., a set of parameters representing a n IR system b ehaviour) instead of selecting a single parameter at a time. The selection of the best con guration i s c ast a s a p roblem o f r anking di erent possible con gurations given a query. We apply learning-to-rank approaches for this task. We exploit both the query features and the system con guration f eatures i n t he l earning-to-rank m ethod s o t hat the selection of con guration i s q uery d ependent. T he e xperiments w e c onducted o n f our T REC a d h oc collections show that this approach can signi cantly outperform the traditional m ethod t o tune system conguration g lobally ( i.e., g rid s earch) a nd l eads t o h igher e ectiveness th an th e to p pe rforming sy stems of the TREC tracks. We also perform an ablation analysis on the impact of di erent f eatures o n t he model learning capability and show that query expansion features are among the most important for adaptive systems.The study presented in this article is built on the results and conclusions of the previous descriptive analysis studies but moves a step further by performing a predictive analysis: We investigate how system parameters can be set to t a given query, i.e., a query-dependent setting of system parameters. We assume that some parameters of the system can be set on the y at querying time, and a retrieval system allows us to set di erent values for the parameters easily. This is indeed the case for most IR systems nowadays. Retrieval platforms such as Terrier 4 [61], Lemur 5 [70], or Lucene 6 [53] allow us to set parameters for the retrieval step. For example, one may choose between several retrieval models (e.g., BM25, language models), di erent query expansion schemes, and so on. We target this group of parameters that can be set at query time. In contrast, we assume that an IR system has already built an index that cannot be changed easily. For example, it would be di cult to choose between di erent stemmers at query time, unless we construct several indexes using di erent stemmers. We exclude these parameters that cannot be set at query time in this study.The problem we tackle in this article is query-dependent param...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.