A checkpointing‐enabled and resource‐aware Java Virtual Machine for efficient and robust e‐Science applications in grid environments

Simão, José; Garrochinho, Tiago; Veiga, Luís

doi:10.1002/cpe.1879

Cited by 10 publications

(4 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Among them, the implementation of a fault tolerant Java Virtual Machine that supports checkpointing is proposed in [14], and the robustness of the critical paths of workflows is ensured in [15]. In both cases, an issue of the work is to find a balance between resource consumption and desired reliability.…”

Section: Experimental Middlewarementioning

confidence: 99%

Evaluation of an adaptive framework for resilient Monte Carlo executions

Montero

Rodríguez-Pascual

Mayo-García

2015

Proceedings of the 30th Annual ACM Symposium on Applied Computing

View full text Add to dashboard Cite

Solving certain calculations in time is crucial for some industrial, medical or research areas. However, problems with high computational requirements are specially constrained to the dependability of the computational resources. Distributed Computing Infrastructures have consolidated as the platform that can solve the issue in last decade. Grid and cloud infrastructures can currently supply users with thousands of resources of different types. Nevertheless, despite the advances achieved, the nature of these platforms finally makes them unpredictable, especially grid. Users continuously experience failures and poor performance, and consequently, infrastructures are unfeasible for some calculations. An instrument to deal with this lack of dependability is to build adaptive algorithms specifically designed for increasing the reliability of certain types of applications on these heterogeneous and dynamic infrastructures. In this work, the suitability of the Montera2 framework is evaluated for Monte Carlo calculations. For this purpose, the proposed approach is compared with the basic tools offered by current middleware, testing the execution of a set of different applications on production infrastructures.

show abstract

Section: Experimental Middlewarementioning

confidence: 99%

Evaluation of an adaptive framework for resilient Monte Carlo executions

Montero

Rodríguez-Pascual

Mayo-García

2015

Proceedings of the 30th Annual ACM Symposium on Applied Computing

View full text Add to dashboard Cite

show abstract

“…In order to comply with this requisite, each instance of ARA-JVM is enhanced with services that allow for: i) monitor the application progress, ii) account resource consumption, iii) reconfigure internal parameters and/or mechanisms, and iv) checkpoint, restore and migrate the whole application. In [12] we focus on the last point which regards checkpointing and restore. Our current design and implementation effort mainly concerns the progress monitoring module and the resource scheduler.…”

Section: An Enabling Architecturementioning

confidence: 99%

A progress and profile-driven cloud-VM for resource-efficiency and fairness in e-science environments

Simão

Veiga

2013

Proceedings of the 28th Annual ACM Symposium on Applied Computing

Self Cite

View full text Add to dashboard Cite

Cloud platforms are becoming more prevalent in e-Science domains, also by encompassing new and existing Grid infrastructures into private, hybrid and federated clouds. Clouds are inherently multi-tenant as they run workloads from multiple users. Resources can be initially allocated statically, as for job scheduling in Grids previously, but they can also be changed elastically at runtime to meet the application effective needs.When allocation needs to be changed, and resources are scarce, determining from which tenants resources must be taken to impact performance the least is a non-trivial and often deemed intractable problem, when outside the realm of batch scheduling and full prior information on resource requirements for each task, job, or VM instance.In this paper we present a Java-based platform for cloud environments that is able to : i) monitor application progress with different levels-of-detail and allowing full applications transparency, ii) account and restrict resource consumption, such as CPU and memory, by applications, and iii) A clusterwide and decentralized algorithm that, based on the progress of different workloads, can redistribute resources among different JVM instances. Evaluation shows it is able to improve resource-efficiency and fairness across e-Science private cloud infrastructures, by managing and migrating resources according to the previous criteria, driven by a number of novel proposed metrics inspired in Economics.

show abstract

“…Furthermore, the replay can only be necessary to be done from a certain point in time because the fault is known to occur only at the end of execution. Ditto uses a lightweight checkpointing mechanism [17] to offer two new replay services: (i) replay to most recent point before fault; (ii) replay to any instant M in execution. Checkpoint is done recording each thread stack and reachable objects.…”

Section: Lightweight Checkpointingmentioning

confidence: 99%

“…In this case, the total recording space is N * sizeof (checkpoint) + N * sizeof (truncatedLog), where N is the number of times a checkpoint is done. In this case there is a trade-off between overhead in execution time and granularity in available replay start times [17]. Even so, the total recording space is bounded to be smaller than 2 * N * sizeof (checkpoint).…”

Section: Lightweight Checkpointingmentioning

confidence: 99%

Ditto – Deterministic Execution Replayability-as-a-Service for Java VM on Multiprocessors

Silva¹,

Simão

Veiga

2013

Middleware 2013

Self Cite

View full text Add to dashboard Cite

Alongside the rise of multi-processor machines, concurrent programming models have grown to near ubiquity. Programs built on these models are prone to bugs with rare preconditions , arising from unanticipated interactions between parallel tasks. Replayers can be efficient on uni-processor machines, but struggle with unreasonable overhead on multi-processors, both concerning slowdown of the execution time and size of the replay log. We present Ditto, a deterministic replayer for concurrent JVM applications executed on multi-processor machines, using both state-of-the-art and novel techniques. The main contribution of Ditto is a novel pair of recording and replaying algorithms that: (a) serialize memory accesses at the instance field level, (b) employ partial transitive reduction and program-order pruning on-the-fly, (c) take advantage of TLO static analysis, escape analysis and JVM compiler optimizations to identify thread-local accesses, and (d) take advantage of a lightweight checkpoint mechanism to avoid large logs in long running applications with fine granularity interactions, and for faster replay to any point in execution. The results show that Ditto out-performs previous deterministic replayers targeted at Java programs.

show abstract

A checkpointing‐enabled and resource‐aware Java Virtual Machine for efficient and robust e‐Science applications in grid environments

Cited by 10 publications

References 26 publications

Evaluation of an adaptive framework for resilient Monte Carlo executions

Evaluation of an adaptive framework for resilient Monte Carlo executions

A progress and profile-driven cloud-VM for resource-efficiency and fairness in e-science environments

Ditto – Deterministic Execution Replayability-as-a-Service for Java VM on Multiprocessors

Contact Info

Product

Resources

About