Abstract-With the tremendous growth in stored data, the role of database systems has become more significant than ever before. Standard query workloads, such as the TPC-C and TPC-H benchmark suites, are used to evaluate and tune the functionality and performance of database systems. Running and configuring benchmarks is a time consuming task. It requires substantial statistical expertise due to the enormous data size and large number of queries in the workload. Subsetting can be used to reduce the number of queries in a workload. An existing workload subsetting technique selected queries based on similarities of the ranks of the queries for low-level characteristics, such as cache miss rates, or based on the execution time required in different computer systems. However, many low-level characteristics are correlated, produce similar behaviors. Also, raw execution time as a metric is too diffuse to capture important performance bottlenecks. Our goal is to select a subset of queries that can reproduce the same bottlenecks in the system as the original workload. In this paper, we propose a statistical approach for creating a database query workload based on performance bottlenecks (SCRAP). Our methodology takes a query workload and a set of system configuration parameters as inputs, and selects a subset of the queries from the workload based on the similarity of performance bottlenecks. Experimental results using the TPC-H benchmark and the PostgreSQL database system, show that the reduced workload and the original workload produce similar performance bottlenecks, and the subset accurately estimates the total execution time.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.