The typical approach for learned DBMS components is to capture the behavior by running a representative set of queries and use the observations to train a machine learning model. This workload-driven approach, however, has two major downsides. First, collecting the training data can be very expensive, since all queries need to be executed on potentially large databases. Second, training data has to be recollected when the workload or the database changes. To overcome these limitations, we take a different route and propose a new data-driven approach for learned DBMS components which directly supports changes of the workload and data without the need of retraining. Indeed, one may now expect that this comes at a price of lower accuracy since workload-driven approaches can make use of more information. However, this is not the case. The results of our empirical evaluation demonstrate that our data-driven approach not only provides better accuracy than state-ofthe- art learned components but also generalizes better to unseen queries.
Monitoring the development of infectious diseases is of great importance for the prevention of major outbreaks. Syndromic surveillance aims at developing algorithms which can detect outbreaks as early as possible by monitoring data sources which allow to capture the occurrences of a certain disease. Recent research mainly concentrates on the surveillance of specific, known diseases, putting the focus on the definition of the disease pattern under surveillance. Until now, only little effort has been devoted to what we call non-specific syndromic surveillance, i.e., the use of all available data for detecting any kind of infectious disease outbreaks. In this work, we give an overview of non-specific syndromic surveillance from the perspective of machine learning and propose a unified framework based on global and local modeling techniques. We also present a set of statistical modeling techniques which have not been used in a local modeling context before and can serve as benchmarks for the more elaborate machine learning approaches. In an experimental comparison of different approaches to non-specific syndromic surveillance we found that these simple statistical techniques already achieve competitive results and sometimes even outperform more elaborate approaches. In particular, applying common syndromic surveillance methods in a non-specific setting seems to be promising.
Infectious disease surveillance is of great importance for the prevention of major outbreaks. Syndromic surveillance aims at developing algorithms which can detect outbreaks as early as possible by monitoring data sources which allow to capture the occurrences of a certain disease. Recent research mainly focuses on the surveillance of specific, known diseases, putting the focus on the definition of the disease pattern under surveillance. Until now, only little effort has been devoted to what we call non-specific syndromic surveillance, i.e., the use of all available data for detecting any kind of outbreaks, including infectious diseases which are unknown beforehand. In this work, we revisit published approaches for non-specific syndromic surveillance and present a set of simple statistical modeling techniques which can serve as benchmarks for more elaborate machine learning approaches. Our experimental comparison on established synthetic data and real data in which we injected synthetic outbreaks shows that these benchmarks already achieve very competitive results and often outperform more elaborate algorithms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.