Autonomous, failure-resilient orchestration of distributed discrete event simulations

Malensek, Matthew; Sui, Zhiquan; Harvey, Neil; Pallickara, Shrideep

doi:10.1145/2494621.2494625

Cited by 4 publications

(3 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Each modification of the input parameters requires a new set of iterations to be run. Dividing the target simulation into several units and executing them in parallel is one way to improve overall execution times [5,6], but generally does not enable real-time exploration. In this work, we target real-time computational guarantees that involve providing subsecond, interactive responses to the user as simulation parameters are changed.…”

Section: Introductionmentioning

confidence: 99%

Predictive analytics using statistical, learning, and ensemble methods to support real-time exploration of discrete event simulations

Budgaga

Malensek

Harvey

et al. 2016

Future Generation Computer Systems

Self Cite

View full text Add to dashboard Cite

h i g h l i g h t s• Our approach enables fast, accurate forecasts of discrete event simulations.• The framework copes with high dimensionality and voluminous datasets.• We facilitate simulation execution with cycle scavenging and cloud resources.• We create and evaluate several predictive models, including ensemble methods. • Our framework is made accessible to end users through an interactive web interface. a b s t r a c tDiscrete event simulations (DES) provide a powerful means for modeling complex systems and analyzing their behavior. DES capture all possible interactions between the entities they manage, which makes them highly expressive but also compute-intensive. These computational requirements often impose limitations on the breadth and/or depth of research that can be conducted with a discrete event simulation.This work describes our approach for leveraging the vast quantity of computing and storage resources available in both private organizations and public clouds to enable real-time exploration of discrete event simulations. Rather than directly targeting simulation execution speeds, we autonomously generate and execute novel scenario variants to explore a representative subset of the simulation parameter space. The corresponding outputs from this process are analyzed and used by our framework to produce models that accurately forecast simulation outcomes in real time, providing interactive feedback and facilitating exploratory research.Our framework distributes the workloads associated with generating and executing scenario variants across a range of commodity hardware, including public and private cloud resources. Once the models have been created, we evaluate their performance and improve prediction accuracy by employing dimensionality reduction techniques and ensemble methods. To make these models highly accessible, we provide a user-friendly interface that allows modelers and epidemiologists to modify simulation parameters and see projected outcomes in real time.

show abstract

Section: Introductionmentioning

confidence: 99%

Predictive analytics using statistical, learning, and ensemble methods to support real-time exploration of discrete event simulations

Budgaga

Malensek

Harvey

et al. 2016

Future Generation Computer Systems

Self Cite

View full text Add to dashboard Cite

show abstract

“…Authors' addresses: Z. Sui, M. Malensek, andS. Pallickara, Computer Science Department, Colorado State University, 1873 Campus Delivery, Fort Collins, CO 80523-1873, USA; Harvey, Department of Computing and Information Science, University of Guelph, Guelph, Ontario N1G 2W1, Canada.…”

Section: Introductionmentioning

confidence: 99%

“…Ultimately, each of these items can have severe and unexpected performance consequences in a distributed setting and must be accounted for to ensure that resources are used efficiently. Our prior research [Malensek et al 2013], which focused on autonomous fault tolerance functionality, has been extended in this article to target the two remaining aspects of resource uncertainty. Resource slowdowns may occur due to an increase in the number of processes executing concurrently at the resource, load spikes, or runaway processes resulting from a coding error.…”

Section: Introductionmentioning

confidence: 99%

Autonomous Orchestration of Distributed Discrete Event Simulations in the Presence of Resource Uncertainty

Sui

Malensek

Harvey

et al. 2015

ACM Trans. Auton. Adapt. Syst.

Self Cite

View full text Add to dashboard Cite

Discrete event simulations model the behavior of complex, real-world systems. Simulating a wide range of events and conditions provides a more nuanced model, but also increases its computational footprint. To manage these processing requirements in a scalable manner, discrete event simulations can be distributed across multiple computing resources. Orchestrating the simulations in a distributed setting involves coping with resource uncertainty. We consider three key aspects of resource uncertainty: resource failures, heterogeneity, and slowdowns. Each of these aspects is managed autonomously, which involves making accurate predictions of future execution times and latencies while also accounting for differences in hardware capabilities and dynamic resource consumption profiles. Further complicating matters, individual tasks within the simulation are stateful and stochastic, requiring inter-task communication and synchronization to produce accurate outcomes. We deal with these challenges through intelligent state collection and migration, active resource monitoring, and empirical evaluation of resource capabilities under changing conditions. To underscore the viability of our solution, we provide benchmarks using a production discrete event simulation that can simultaneously sustain failures, manage resource heterogeneity, and handle slowdowns while being orchestrated by our framework.

show abstract

Transparent three-phase Byzantine fault tolerance for parallel and distributed simulations

Cai

Turner

et al. 2016

Simulation Modelling Practice and Theory

View full text Add to dashboard Cite

Autonomous, failure-resilient orchestration of distributed discrete event simulations

Cited by 4 publications

References 24 publications

Predictive analytics using statistical, learning, and ensemble methods to support real-time exploration of discrete event simulations

Predictive analytics using statistical, learning, and ensemble methods to support real-time exploration of discrete event simulations

Autonomous Orchestration of Distributed Discrete Event Simulations in the Presence of Resource Uncertainty

Transparent three-phase Byzantine fault tolerance for parallel and distributed simulations

Contact Info

Product

Resources

About