Abstract. Access to relevant information adapted to the needs and the context of the user is a real challenge. The user context can be assimilated to all factors that can describe his intentions and perceptions of his surroundings. It is difficult to find a contextual information retrieval system that takes into account all contextual factors. In this paper, both types of context user context and query context are integrated in an Information Retrieval (IR) model based on language modeling. Here, the query context include the integration of linguistic and semantic knowledge about the user query in order to explore the most exact understanding of user's information needs. In addition, we consider one of the important factors of the user context, the user's domain of interest or the interesting topic. A thematic algorithm is proposed to describe the user context. We assume that each topic can be characterized by a set of documents from the experimented corpus. The documents of each topic are used to build a statistical language model, which is then integrated to expand the original query model and to re-rank the retrieved documents. Our experiments on the 20 Newsgroup corpus show that the proposed contextual approach improves significantly the retrieval effectiveness compared to the basic approach, which does not consider contextual factors.
In this paper, we propose an original approach for text warehousing process. It is based on a decisional architecture which combines classical data warehousing tasks and information retrieval (IR) techniques. We first propose a new ETL process, named ETL-Text, for textual data integration and then, we present a new Text Warehouse Model, denoted TWM, which takes into account both the structure and the semantics of the textual data. TWM is associated with new dimensions types including: a metadata dimension and a semantic dimension. In addition, we propose a new analysis measure based on the modeling language widely used in IR area. Moreover, our approach is based on Wikipedia as external knowledge source to extract the semantics of the textual documents. To validate our approach, we develop a prototype composed of several processing modules that illustrate the different steps of the ETL-Text. Also, we use the 20 Newsgroups corpus to perform our experimentations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.