This paper suggests that more powerful database systems (DBMS) can be built by supporting database procedures as full-fledged database objects. In particular, allowing fields of a database to be a collection of queries in the query language of the system is shown to allow the natural expression of complex data relationships. Moreover, many of the features present in object-oriented systems and semantic data models can be supported by this facility.
In order to implement this construct, extensions to a typical relational query language must be made, and considerable work on the execution engine of the underlying DBMS must be accomplished. This paper reports on the extensions for one particular query language and data manager and then gives performance figures for a prototype implementation. Even though the performance of the prototype is competitive with that of a conventional system, suggestions for improvement are presented.
Apache Hive is a widely used data warehouse system for Apache Hadoop, and has been adopted by many organizations for various big data analytics applications. Closely working with many users and organizations, we have identified several shortcomings of Hive in its file formats, query planning, and query execution, which are key factors determining the performance of Hive. In order to make Hive continuously satisfy the requests and requirements of processing increasingly high volumes data in a scalable and efficient way, we have set two goals related to storage and runtime performance in our efforts on advancing Hive. First, we aim to maximize the effective storage capacity and to accelerate data accesses to the data warehouse by updating the existing file formats. Second, we aim to significantly improve cluster resource utilization and runtime performance of Hive by developing a highly optimized query planner and a highly efficient query execution engine. In this paper, we present a community-based effort on technical advancements in Hive. Our performance evaluation shows that these advancements provide significant improvements on storage efficiency and query execution performance. This paper also shows how academic research lays a foundation for Hive to improve its daily operations.
Over the last two releases SQL Server has integrated two specialized engines into the core system: the Apollo column store engine for analytical workloads and the Hekaton in-memory engine for high-performance OLTP workloads. There is an increasing demand for real-time analytics, that is, for running analytical queries and reporting on the same system as transaction processing so as to have access to the freshest data. SQL Server 2016 will include enhancements to column store indexes and in-memory tables that significantly improve performance on such hybrid workloads. This paper describes four such enhancements: column store indexes on inmemory tables, making secondary column store indexes on diskbased tables updatable, allowing B-tree indexes on primary column store indexes, and further speeding up the column store scan operator.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.