Data visualization is an effective mechanism for identifying trends, insights, and anomalies in data. On large datasets, however, generating visualizations can take a long time, delaying the extraction of insights, hampering decision making, and reducing exploration time. One solution is to use online sampling-based schemes to generate visualizations faster while improving the displayed estimates incrementally, eventually converging to the exact visualization computed on the entire data. However, the intermediate visualizations are approximate, and often fluctuate drastically, leading to potentially incorrect decisions. We propose sampling-based incremental visualization algorithms that reveal the "salient" features of the visualization quickly-with a 46× speedup relative to baselines-while minimizing error, thus enabling rapid and errorfree decision making. We demonstrate that these algorithms are optimal in terms of sample complexity, in that given the level of interactivity, they generate approximations that take as few samples as possible. We have developed the algorithms in the context of an incremental visualization tool, titled INCVISAGE, for trendline and heatmap visualizations. We evaluate the usability of INCVIS-AGE via user studies and demonstrate that users are able to make effective decisions with incrementally improving visualizations, especially compared to vanilla online-sampling based schemes.
We investigate the local differential privacy (LDP) guarantees of a randomized privacy mechanism via its contraction properties. We first show that LDP constraints can be equivalently cast in terms of the contraction coefficient of the E γ -divergence. We then use this equivalent formula to express LDP guarantees of privacy mechanisms in terms of contraction coefficients of arbitrary f -divergences. When combined with standard estimation-theoretic tools (such as Le Cam's and Fano's converse methods), this result allows us to study the trade-off between privacy and utility in several testing and minimax and Bayesian estimation problems.
Recent work of Acharya et al. (NeurIPS 2019) showed how to estimate the entropy of a distribution D over an alphabet of size k up to ±ǫ additive error by streaming over (k/ǫ 3 ) • polylog(1/ǫ) i.i.d. samples and using only O(1) words of memory. In this work, we give a new constant memory scheme that reduces the sample complexity to (k/ǫ 2 ) • polylog(1/ǫ). We conjecture that this is optimal up to polylog(1/ǫ) factors.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.