Anomaly diagnosis is vital to the performance of online transaction processing (OLTP) systems. In the meanwhile, machine learning techniques can reason complex relationships beyond human abilities and perform well on such problems. However, they rely on a large number of training samples for anomalies, which are in serious shortage in both industry and academia due to the difficulty of collection. The problem raises the demand of a benchmark for anomaly reproduction and data collection.
In this paper, we propose DBPA, a benchmark for transactional database performance anomalies. Specifically, we identify nine common anomalies rooted in the diverse influence factors. For each anomaly, we carefully design a reproduction procedure, which consists with its root cause in real-world databases. With the reproduction procedures, users can easily generate a dataset in a new environment and extend new anomaly types. For compound anomalies, we provide a generation algorithm that allows users to generate compound anomalies data of any possible combinations with existing collected data. We also provide a large dataset of both normal and anomalous monitoring data collected from various environments, facilitating the training of machine learning models and the evaluation of new algorithms for anomaly diagnosis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.