This is the accepted version of the paper.This version of the publication may differ from the final published version. Permanent repository link AbstractBeing an important data exchange and information storage standard, XML has generated a great deal of interest and particular attention has been paid to the issue of XML indexing. Clear use cases for structured search in XML have been established. However, most of the research in the area is either based on relational database systems or specialized semi-structured data management systems.In this paper, we propose a method for XML indexing based on the Information Retrieval (IR) systemOkapi. Firstly, we review the structure of inverted files and give an overview of the issues of why this indexing mechanism cannot properly support XML retrieval, using the underlying data structures ofOkapi as an example. Then we explore a revised method implemented on Okapi using path indexing
This paper reviews some of the past Okapi research and discusses the role of Okapi in the current TIPS project. The main purpose is to report new challenges faced by probabilistic text retrieval in the web environment and to indicate some of the solutions that are currently under investigation. In this context, extraction of indexing units from formatted document sources, user interface design, implementation of field searching and query expansion within the framework of probabilistic searching are discussed. The problem of maintaining session continuity in the web environment and a possible solution to this problem are outlined. Other challenges posed by the open nature of the web environment are also indicated. These include the difficulty of delimiting the boundaries of a search session and the potential of the web for collaborative information retrieval. A system for collaboratively filtering documents based on their contents is described in this connection. Issues surrounding the integration of Okapi with other pieces of software being developed for the TIPS project are also briefly discussed.
This paper reviews some of the past Okapi research and discusses the role of Okapi in the current TIPS project. The main purpose is to report new challenges faced by probabilistic text retrieval in the web environment and to indicate some of the solutions that are currently under investigation. In this context, extraction of indexing units from formatted document sources, user interface design, implementation of field searching and query expansion within the framework of probabilistic searching are discussed. The problem of maintaining session continuity in the web environment and a possible solution to this problem are outlined. Other challenges posed by the open nature of the web environment are also indicated. These include the difficulty of delimiting the boundaries of a search session and the potential of the web for collaborative information retrieval. A system for collaboratively filtering documents based on their contents is described in this connection. Issues surrounding the integration of Okapi with other pieces of software being developed for the TIPS project are also briefly discussed. 50 1 2
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.