Modern in-memory database systems are facing the need of efficiently supporting mixed workloads of OLTP and OLAP. A conventional approach to this requirement is to rely on ETL-style, application-driven data replication between two very different OLTP and OLAP systems, sacrificing realtime reporting on operational data. An alternative approach is to run OLTP and OLAP workloads in a single machine, which eventually limits the maximum scalability of OLAP query performance. In order to tackle this challenging problem, we propose a novel database replication architecture called Asynchronous Parallel Table Replication (ATR). ATR supports OLTP workloads in one primary machine, while it supports heavy OLAP workloads in replicas. Here, rowstore formats can be used for OLTP transactions at the primary, while column-store formats are used for OLAP analytical queries at the replicas. ATR is designed to support elastic scalability of OLAP query performance while it minimizes the overhead for transaction processing at the primary and minimizes CPU consumption for replayed transactions at the replicas. ATR employs a novel optimistic lock-free parallel log replay scheme which exploits characteristics of multi-version concurrency control (MVCC) in order to enable real-time reporting by minimizing the propagation delay between the primary and replicas. Through extensive experiments with a concrete implementation available in a commercial database system, we demonstrate that ATR achieves sub-second visibility delay even for updateintensive workloads, providing scalable OLAP performance without notable overhead to the primary.
Database replication is widely known and used for high availability or load balancing in many practical database systems. In this paper, we show how a replication engine can be used for three important practical cases that have not previously been studied very well. The three practical use cases include: 1) scaling out OLTP/OLAP-mixed workloads with partitioned replicas, 2) efficiently maintaining a distributed secondary index for a partitioned table, and 3) efficiently implementing an online re-partitioning operation. All three use cases are crucial for enabling a high-performance shared-nothing distributed database system. To support the three use cases more efficiently, we propose the concept of asymmetric-partition replication , so that replicas of a table can be independently partitioned regardless of whether or how its primary copy is partitioned. In addition, we propose the optimistic synchronous commit protocol which avoids the expensive two-phase commit without sacrificing transactional consistency. The proposed asymmetric-partition replication and its optimized commit protocol are incorporated in the production versions of the SAP HANA in-memory database system. Through extensive experiments, we demonstrate the significant benefits that the proposed replication engine brings to the three use cases.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.