Efficient data processing is critical for interactive visualization of analytic data sets. Inspired by the large amount of recent research on column-oriented stores, we have developed a new specialized analytic data engine tightly-coupled with the Tableau data visualization system.The Tableau Data Engine ships as an integral part of Tableau 6.0 and is intended for the desktop and server environments. This paper covers the main requirements of our project, system architecture and query-processing pipeline. We use real-life visualization scenarios to illustrate basic concepts and provide experimental evaluation.
Data sets are growing rapidly and there is an attendant need for tools that facilitate human analysis of them in a timely manner. To help meet this need, column-oriented databases (or "column stores") have come into wide use because of their low latency on analytic workloads. Column stores use a number of techniques to produce these dramatic performance techniques, including the ability to perform operations directly on compressed data.In this paper, we describe how the Tableau Data Engine (an internally developed column store) leverages a number of compression techniques to improve query performance. The approach is simpler than existing systems for operating on compressed data and more unified, removing the necessity for custom data access mechanisms. The approach also uses some novel metadata extraction techniques to improve the choices made by the system's run-time optimizer.
The rapid increase in data volumes and complexity of applied analytical tasks poses a big challenge for visualization solutions. It is important to keep the experience highly interactive, so that users stay engaged and can perform insightful data exploration.Query processing usually dominates the cost of visualization generation. Therefore, in order to achieve acceptable response times, one needs to utilize backend capabilities to the fullest and apply techniques, such as caching or prefetching. In this paper we discuss key data processing components in Tableau: the query processor, query caches, Tableau Data Engine [1, 2] and Data Server. Furthermore, we cover recent performance improvements related to the number and quality of remote queries, broader reuse of cached data, and application of inter and intra query parallelism.
Windowed aggregates are a SQL 2003 feature for computing aggregates in moving windows. Common examples include cumulative sums, local maxima and moving quantiles. With the advent over the last few years of easy-to-use data analytics tools, these functions are becoming widely used by more and more analysts, but some aggregates (such as local maxima) are much easier to compute than others (such as moving quantiles). Nevertheless, aggregates that are more difficult to compute, like quantile and mode (or "most frequent") provide more appropriate statistical summaries in the common situation when a distribution is not Gaussian and are an essential part of a data analysis toolkit. Recent work has described highly efficient windowed implementations of the most common aggregate function categories, including distributive 1 aggregates such as cumulative sums and algebraic aggregates such as moving averages. But little has been published on either the implementation or the performance of the more complex holistic windowed aggregates such as moving quantiles. This paper provides the first in-depth study of how to efficiently implement the three most common holistic windowed aggregates (count distinct, mode and quantile) by reusing the aggregate state between consecutive frames. Our measurements show that these incremental algorithms generally achieve improvements of about 10× over naïve implementations, and that they can effectively detect when to reset the internal state during extreme frame variation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.