LONG PAPER. BaseX is an early adopter of the upcoming XQuery Full Text Recommendation. This paper presents some of the enhancements made to the XML database to fully support the language extensions. The system's data and index structures are described, and implementation details are given on the XQuery compiler, which supports sequential scanning, index-based, and hybrid processing of full-text queries. Experimental analysis and an insight into visual result presentation of query results conclude the presentation.
Mere storage of personal data in state-of-the-art filesystems is a markedly well done job in current operating systems. Convenient access to and information retrieval from such data, however, is crucial to leverage the stored information. Thereby database style query languages can be of great use. We demonstrate a user level filesystem implementation that is built on recent semi-structured database storage techniques. As such, it serves as a storage layer for the BaseX XQuery processor and, while it appears to the operating system as a conventional filesystem, a large part of its content can be queried using XPath/XQuery.
A key difference between traditional humanities research and the emerging field of digital humanities is that the latter aims to complement qualitative methods with quantitative data. In linguistics, this means the use of large corpora of text, which are usually annotated automatically using natural language processing tools. However, these tools do not exist for historical texts, so scholars have to work with unannotated data. We have developed a system for systematic, iterative exploration and annotation of historical text corpora, which relies on an XML database (BaseX) and in particular on the Full Text and Update facilities of XQuery.
With the rise of XML, the database community has been challenged by semi-structured data processing. Since the data type behind XML is the tree, state-of-the-art RDBMSs have learned to deal with such data (e.g., [18,5,6,16]). This paper introduces a Ph.D. project focused on the question in how far the tree-awareness of recent RDBMSs can be used to, once again, try to implement filesystems using database technology. Our main goal is to provide means to query the data stored in filesystems and to find ways to enhance/ combine the data storage and query capabilities of operating systems using semi-structured database technology.Two DBMSs with relational XML storage, built on top of the XPath accelerator numbering scheme [14], are the foundations for our work. With BaseX, an XML database, we establish a link between user, database and filesystem content. BaseX allows visual access to filesystem data stored in the database. An integrated query interface allows users to filter query results in interactive response time. Second, we establish a link between DBMS and OS. We implement a filesystem in userspace backed by the MonetDB/XQuery system, a well-known relational database system, which integrates the Pathfinder XQuery compiler [5] and the MonetDB kernel [4].As a result, the DBMS is mounted as a conventional filesystem by the operating system kernel. Consequently, access via the established (virtual) filesystem interface as well as database enhanced access to the same data is provided.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.