Consider real-time exploration of large multidimensional spatiotemporal datasets with billions of entries, each defined by a location, a time, and other attributes. Are certain attributes correlated spatially or temporally? Are there trends or outliers in the data? Answering these questions requires aggregation over arbitrary regions of the domain and attributes of the data. Many relational databases implement the well-known data cube aggregation operation, which in a sense precomputes every possible aggregate query over the database. Data cubes are sometimes assumed to take a prohibitively large amount of space, and to consequently require disk storage. In contrast, we show how to construct a data cube that fits in a modern laptop's main memory, even for billions of entries; we call this data structure a nanocube. We present algorithms to compute and query a nanocube, and show how it can be used to generate well-known visual encodings such as heatmaps, histograms, and parallel coordinate plots. When compared to exact visualizations created by scanning an entire dataset, nanocube plots have bounded screen error across a variety of scales, thanks to a hierarchical structure in space and time. We demonstrate the effectiveness of our technique on a variety of real-world datasets, and present memory, timing, and network bandwidth measurements. We find that the timings for the queries in our examples are dominated by network and user-interaction latencies.
original research article Purpose: Cancer is familial; yet known cancer predisposition genes, as well as recognized environmental factors, explain only a small percentage of familial cancer clusters. This population-based description of cancer clustering describes patterns of cancer coaggregation suggestive of a genetic predisposition. methods: Using a computerized genealogy of Utah families linked to a statewide cancer registry, we estimated the relative risks for 36 different cancer sites in first-, second-, and third-degree relatives of cancer cases, for each cancer site individually, and between cancer sites. We estimated the sex-and birth-year-specific rates for cancer using 1 million individuals in the resource. We applied these rates to groups of cases or relatives and compared the observed and expected numbers of cancers to estimate relative risks.Results: Many cancer sites show significantly elevated relative risks among distant relatives for cancer of the same site, strongly supporting a heritable contribution. Multiple combinations of cancer sites were observed among first-, second-, and third-degree relatives, suggesting the existence of heritable syndromes involving more than one cancer site.conclusion: This complete description of coaggregation of cancer by site in a well-defined population provides a set of observations supporting heritable cancer predispositions and may support the existence of genetic factors for many different cancers.Genet Med 2012:14(1):107-114
Abstract. As workflow systems get more widely used, the number of workflows and the volume of provenance they generate has grown considerably. New tools and infrastructure are needed to allow users to interact with, reason about, and re-use this information. In this paper, we explore the use of clustering techniques to organize large collections of workflow and provenance graphs. We propose two different representations for these graphs and present an experimental evaluation, using a collection of 1,700 workflow graphs, where we study the trade-offs of these representations and the effectiveness of alternative clustering techniques.
Fig. 1. Tag cloud lenses in BirdVis are used as a tool to explore and understand relative habitat preferences suggested by a species distribution model over space, time, and across species. Here, we show occurrence maps for the Indigo Bunting. The lenses cover three different regions on three different dates of the breeding season of 2009: May 4, June 22, and August 24, 2009. They show important differences in habitat preferences across these regions as well as how the preferences change over time within a region. The ability to interact with these visualizations, by changing regions and dates, and comparing different bird species, provides an unprecedented tool for scientists to understand bird distribution and consequently obtain insights about the environment and how it changes over time. The supplemental video gives an overview of different features and visualizations provided by BirdVis.Abstract-Birds are unrivaled windows into biotic processes at all levels and are proven indicators of ecological well-being. Understanding the determinants of species distributions and their dynamics is an important aspect of ecology and is critical for conservation and management. Through crowdsourcing, since 2002, the eBird project has been collecting bird observation records. These observations, together with local-scale environmental covariates such as climate, habitat, and vegetation phenology have been a valuable resource for a global community of educators, land managers, ornithologists, and conservation biologists. By associating environmental inputs with observed patterns of bird occurrence, predictive models have been developed that provide a statistical framework to harness available data for predicting species distributions and making inferences about species-habitat associations. Understanding these models, however, is challenging because they require scientists to quantify and compare multiscale spatialtemporal patterns. A large series of coordinated or sequential plots must be generated, individually programmed, and manually composed for analysis. This hampers the exploration and is a barrier to making the cross-species comparisons that are essential for coordinating conservation and extracting important ecological information. To address these limitations, as part of a collaboration among computer scientists, statisticians, biologists and ornithologists, we have developed BirdVis, an interactive visualization system that supports the analysis of spatio-temporal bird distribution models. BirdVis leverages visualization techniques and uses them in a novel way to better assist users in the exploration of interdependencies among model parameters. Furthermore, the system allows for comparative visualization through coordinated views, providing an intuitive interface to identify relevant correlations and patterns. We justify our design decisions and present case studies that show how BirdVis has helped scientists obtain new evidence for existing hypotheses, as well as formulate new hypotheses in their domain.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.