QoS-Aware Fault-Tolerant Scheduling for Real-Time Tasks on Heterogeneous Clusters

Zhu, Xiaomin; Qin, Xiao; Qiu, Meikang

doi:10.1109/tc.2011.68

Cited by 109 publications

(4 citation statements)

References 44 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Some works support fault tolerance in cloud environments by scheduling backup copies of failed tasks. For example, QAFT [32] schedules primary and backup copies of tasks on different cloud nodes, considering different QoS levels for the tasks and different speeds for the cloud nodes. However, it only considers independent aperiodic real-time tasks (and not parallel applications), and it does not consider a microservice model as FTRTC does.…”

Section: Related Workmentioning

confidence: 99%

Fault Tolerance in Real-Time Cloud Computing

Abeni

Andreoli

Gustafsson

et al. 2023

2023 IEEE 26th International Symposium on Real-Time Distributed Computing (ISORC)

View full text Add to dashboard Cite

This paper presents the Fault-Tolerant Real-Time Cloud (FTRTC) project that aims to design cloud computing infrastructures capable of hosting highly reliable and real-time applications. These applications are characterized by strict timing and reliability constraints, as well as critical failure scenarios. For instance, such requirements are commonly found in the context of Industry 4.0. We present a formalization of the problem of designing real-time cloud applications supporting an adjustable level of fault tolerance throughout their distributed execution in a cloud infrastructure. The contributions presented in this paper indicate important research directions when building cloud infrastructures able to supporting ultra-reliable real-time applications.

show abstract

Section: Related Workmentioning

confidence: 99%

Fault Tolerance in Real-Time Cloud Computing

Abeni

Andreoli

Gustafsson

et al. 2023

2023 IEEE 26th International Symposium on Real-Time Distributed Computing (ISORC)

View full text Add to dashboard Cite

show abstract

“…A dedicated VM acts as a central scheduler, responsible for scheduling the component tasks of the application jobs to the other VMs of the cloud. 28 The SaaS cloud under study is shown in Figure 1.…”

Section: System Modelmentioning

confidence: 99%

The impact of checkpointing interval selection on the scheduling performance of real‐time fine‐grained parallel applications in SaaS clouds under various failure probabilities

Stavrinides

Karatza

2017

Concurrency and Computation

View full text Add to dashboard Cite

Summary As the adoption of Software as a Service (SaaS) cloud computing continues to gain momentum, the arising challenges of scheduling parallel applications on such platforms need to be addressed. Due to the complexity and the fine‐grained parallelism of the workload, as well as the multi‐tenancy of the underlying host environment, end‐user applications are usually prone to transient software failures. Therefore, fault tolerance is one of the most crucial aspects of scheduling in SaaS clouds. It is usually achieved through application‐directed checkpointing. However, selecting an appropriate checkpointing interval is not a trivial task. Unnecessary frequent checkpointing may degrade the system performance. On the other hand, infrequent checkpointing may lead to greater recovery time and thus poorer performance. Consequently, the checkpointing interval must be selected taking into account the failure probability, as well as the nature of the workload. Towards this direction, we investigate via simulation the impact of checkpointing interval selection on the performance of a SaaS cloud, where fine‐grained parallel applications with firm deadlines and approximate computations are scheduled for execution, under various failure probabilities. The simulation results are analyzed, in an attempt to shed light on the relation between the checkpointing interval and failure probability.

show abstract

“…He et al [19] developed a rolling-horizon scheduling strategy for the energy constrained distributed real-time embedded systems. Zhu et al [62] presented a fault-tolerant scheduling algorithm called QAFT for real-time tasks with QoS needs on heterogeneous clusters. Zhu et al [63] proposed an adaptive energy-efficient scheduling, AEES, for aperiodic and independent real-time tasks on heterogeneous clusters with dynamic voltage scaling.…”

Section: Literature Reviewmentioning

confidence: 99%

“…The energy consumption by the processor of computing node can be approximated as α Δ fv t 2 [61][62][63], where f is the processor clock frequency, v is the supply voltage, Δt is the execution time, and α is a constant. The frequency is positive correlated with the voltage, and reducing voltage can lead to low frequency.…”

Section: Energy Modelmentioning

confidence: 99%