Progressive visualization offers a great deal of promise for big data visualization; however, current progressive visualization systems do not allow for continuous interaction. What if users want to see more confident results on a subset of the visualization? This can happen when users are in exploratory analysis mode but want to ask some directed questions of the data as well. In a progressive visualization system, the online aggregation algorithm determines the database sampling rate and resulting convergence rate, not the user. In this paper, we extend a recent method in online aggregation, called Wander Join, that is optimized for queries that join tables, one of the most computationally expensive operations. This extension leverages importance sampling to enable user-driven sampling when data joins are in the query. We applied user interaction techniques that allow the user to view and adjust the convergence rate, providing more transparency and control over the online aggregation process. By leveraging importance sampling, our extension of Wander Join also allows for stratified sampling of groups when there is data distribution skew. We also improve the convergence rate of filtering queries, but with additional overhead costs not needed in the original Wander Join algorithm.
In the era of big data, along with machine learning and databases, visualization has become critical to managing complex and overwhelming data problems. Vision science has been a foundation of data visualization for decades. As the systems that use visualization become more complex, advances in vision science are needed to provide fundamental theory to visualization researchers and practitioners to address emerging challenges. In this paper, we present our work on modeling the perception of correlation in bivariate visualizations using the Weber's Law. These Weber models can be applied to definitively compare and evaluate the effectiveness of these visualizations. We further demonstrate that the reason for this finding is that people approximate correlation using visual features that are known to follow the Weber's Law. These findings have multiple implications. One practical implication is that results like these can guide practitioners in choosing the appropriate visualization. In the context of big data, this result can lead to perceptually-driven computational techniques. For instance, it could be used for quickly sampling from big data in a way that preserves important data features, which can lead to better computational performance, a less overwhelming user experience, and more fluid interaction.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.