Mareike Höger scite author profile

Data streaming frameworks, constructed to work on large numbers of processing nodes in order to analyze big data, are fault-prone. Not only the large amount of nodes and network components that could fail are a source of errors. Development of data analyzing jobs has the disadvantage that errors or wrong assumptions about the input data may only be detected in productive processing. This usually leads to a reexecution of the entire job and re-computing all input data. This can be a tremendous profuseness of computing time if most of the job's tasks are not affected by these changes and therefore process and produce the same exact data again.This paper describes an approach to use materialized intermediate data from previous job executions to reduce the number of tasks that have to be re-executed in case of an updated job. Saving intermediate data to disk is a common technique to achieve fault tolerance in data streaming systems. These intermediate results can be used for memoization to avoid needless re-execution of tasks. We show that memoization can decrease the runtime of an updated job distinctly.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Mareike Höger

The Stratosphere platform for big data analytics

Failover Pattern with a Self-Healing Mechanism for High Availability Cloud Solutions

Record Skipping in Parallel Data Processing Systems

Progress Estimation in Parallel Data Processing Systems

Memorization of Materialization Points

Contact Info

Product

Resources

About