ASSESSING THE QUALITY or validity of a piece of data is not usually done in isolation. You typically examine the context in which the data appears and try to determine its original sources or review the process through which it was created. This is not so straightforward when dealing with digital data, however: the result of a computation might have been derived from numerous sources and by applying complex successive transformations, possibly over long periods of time.As the quantity of data that contributes to a particular result increases, keeping track of how different sources and transformations are related to each other becomes more difficult. This constrains the ability to answer questions regarding a result's history, such as: What were the underlying assumptions on which the result is based? Under what conditions does it remain valid? What other results were derived from the same data sources?The metadata that needs to be systematically captured to answer those (or similar) questions is called provenance (or lineage) and refers to a graph describing the relationships among all the elements (sources, processing steps, contextual information and dependencies) that contributed to the existence of a piece of data.This article presents current research in this field from a practical perspective, discussing not only existing systems and the fundamental concepts needed for using them in applications today, but also future challenges and opportunities. A number of use cases illustrate how provenance might be useful in practice.Where does data come from? Consider the need to understand the conditions, parameters, or assumptions behind a given result-in other words, the ability to point at a piece of data, for example, research result or anomaly in a system trace, and ask: Where did it come from? This would be useful for experiments involving digital data (such as in silico experiments in biology, other types of numerical simulations, or system evaluations in computer science).The provenance for each run of such experiments contains the links between results and corresponding starting conditions or configuration parameters. This becomes important especially when considering processing pipelines, where some early results serve as the basis of further experiments. Manually tracking all the parameters from a final result through intermediary data and to original sources is burdensome and error-prone.Of course, researchers are not the only ones requiring this type of tracking. The same techniques could be used to help people in the business or financial sectors-for example, figuring out the set of assumptions behind the statistics reported to a board of directors, or determining which mortgages were part of a traded security.
Assessing the quality or validity of a piece of data is not usually done in isolation. You typically examine the context in which the data appears and try to determine its original sources or review the process through which it was created. This is not so straightforward when dealing with digital data, however: the result of a computation might have been derived from numerous sources and by applying complex successive transformations, possibly over long periods of time.
Existing operating systems share a common kernel text section amongst all processes. It is not possible to perform kernel specialization or tuning such that different applications execute text optimized for their kernel use despite the benefits of kernel specialization for performance guided optimization, exokernels, kernel fastpaths, and cheaper hardware access. Current specialization primitives involve system wide changes to kernel text, which can have adverse effects on other processes sharing the kernel due to the global side-effects. We present shadow kernels: a primitive that allows multiple kernel text sections to coexist in a contemporary operating system. By remapping kernel virtual memory on a context-switch, or for individual system calls, we specialize the kernel on a fine-grained basis. Our implementation of shadow kernels uses the Xen hypervisor so can be applied to any operating system that runs on Xen.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.