Lucian Carata scite author profile

Lucian Carata

5Publications

98Citation Statements Received

75Citation Statements Given

How they've been cited

135

How they cite others

Affiliations

University of Cambridge, Gheorghe Asachi Technical University of Iași

Publications

Order By: Most citations

A primer on provenance

et al. 2014

View full text Add to dashboard Cite

ASSESSING THE QUALITY or validity of a piece of data is not usually done in isolation. You typically examine the context in which the data appears and try to determine its original sources or review the process through which it was created. This is not so straightforward when dealing with digital data, however: the result of a computation might have been derived from numerous sources and by applying complex successive transformations, possibly over long periods of time.As the quantity of data that contributes to a particular result increases, keeping track of how different sources and transformations are related to each other becomes more difficult. This constrains the ability to answer questions regarding a result's history, such as: What were the underlying assumptions on which the result is based? Under what conditions does it remain valid? What other results were derived from the same data sources?The metadata that needs to be systematically captured to answer those (or similar) questions is called provenance (or lineage) and refers to a graph describing the relationships among all the elements (sources, processing steps, contextual information and dependencies) that contributed to the existence of a piece of data.This article presents current research in this field from a practical perspective, discussing not only existing systems and the fundamental concepts needed for using them in applications today, but also future challenges and opportunities. A number of use cases illustrate how provenance might be useful in practice.Where does data come from? Consider the need to understand the conditions, parameters, or assumptions behind a given result-in other words, the ability to point at a piece of data, for example, research result or anomaly in a system trace, and ask: Where did it come from? This would be useful for experiments involving digital data (such as in silico experiments in biology, other types of numerical simulations, or system evaluations in computer science).The provenance for each run of such experiments contains the links between results and corresponding starting conditions or configuration parameters. This becomes important especially when considering processing pipelines, where some early results serve as the basis of further experiments. Manually tracking all the parameters from a final result through intermediary data and to original sources is burdensome and error-prone.Of course, researchers are not the only ones requiring this type of tracking. The same techniques could be used to help people in the business or financial sectors-for example, figuring out the set of assumptions behind the statistics reported to a board of directors, or determining which mortgages were part of a traded security.

show abstract

A Primer on Provenance

Carata

Akoush

Balakrishnan

et al. 2014

Queue

View full text Add to dashboard Cite

Assessing the quality or validity of a piece of data is not usually done in isolation. You typically examine the context in which the data appears and try to determine its original sources or review the process through which it was created. This is not so straightforward when dealing with digital data, however: the result of a computation might have been derived from numerous sources and by applying complex successive transformations, possibly over long periods of time.

show abstract

Tuneful: An Online Significance-Aware Configuration Tuner for Big Data Analytics

Fekry¹,

Carata²,

Pasquier³

et al. 2020

Preprint

View full text Add to dashboard Cite

Distributed analytics engines such as Spark are a common choice for processing extremely large datasets. However, finding good configurations for these systems remains challenging, with each workload potentially requiring a different setup to run optimally. Using suboptimal configurations incurs significant extra runtime costs.We propose Tuneful, an approach that efficiently tunes the configuration of in-memory cluster computing systems. Tuneful combines incremental Sensitivity Analysis and Bayesian optimization to identify near optimal configurations from a high-dimensional search space, using a small number of executions. This setup allows the tuning to be done online, without any previous training. Our experimental results show that Tuneful reduces the search time for finding close-to-optimal configurations by 62% (at the median) when compared to existing state-of-the-art techniques. This means that the amortization of the tuning cost happens significantly faster, enabling practical tuning for new classes of workloads.

show abstract

Improving the visualization of electron-microscopy data through optical flow interpolation

Carata

Shao

Hadwiger

et al. 2011

View full text Add to dashboard Cite

Technical developments in neurobiology have reached a point where the acquisition of high resolution images representing individual neurons and synapses becomes possible. For this, the brain tissue samples are sliced using a diamond knife and imaged with electron-microscopy (EM). However, the technique achieves a low resolution in the cutting direction, due to limitations of the mechanical process, making a direct visualization of a dataset difficult. We aim to increase the depth resolution of the volume by adding new image slices interpolated from the existing ones, without requiring modifications to the EM image-capturing method. As classical interpolation methods do not provide satisfactory results on this type of data, the current paper proposes a re-framing of the problem in terms of motion volumes, considering the depth axis as a temporal axis. An optical flow method is adapted to estimate the motion vectors of pixels in the EM images, and this information is used to compute and insert multiple new images at certain depths in the volume. We evaluate the visualization results in comparison with interpolation methods currently used on EM data, transforming the highly anisotropic original dataset into a dataset with a larger depth resolution. The interpolation based on optical flow better reveals neurite structures with realistic undistorted shapes, and helps to easier map neuronal connections.

show abstract

To Tune or Not to Tune?

Fekry

Carata

Pasquier

et al. 2020

View full text Add to dashboard Cite

This experimental study presents a number of issues that pose a challenge for practical configuration tuning and its deployment in data analytics frameworks. These issues include: 1) the assumption of a static workload or environment, ignoring the dynamic characteristics of the analytics environment (e.g., increase in input data size, changes in allocation of resources). 2) the amortization of tuning costs and how this influences what workloads can be tuned in practice in a cost-effective manner. 3) the need for a comprehensive incremental tuning solution for a diverse set of workloads. We adapt different ML techniques in order to obtain efficient incremental tuning in our problem domain, and propose Tuneful, a configuration tuning framework. We show how it is designed to overcome the above issues and illustrate its applicability by running a wide array of experiments in cloud environments provided by two different service providers. CCS CONCEPTS • Theory of computation → Online learning algorithms; Gaussian processes; Non-parametric optimization.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Lucian Carata

A primer on provenance

A Primer on Provenance

Tuneful: An Online Significance-Aware Configuration Tuner for Big Data Analytics

Improving the visualization of electron-microscopy data through optical flow interpolation

To Tune or Not to Tune?

Contact Info

Product

Resources

About