S^3DNN: Supervised Streaming and Scheduling for GPU-Accelerated Real-Time DNN Workloads

Zhou, Husheng; Bateni, Soroush; Liu, Cong

doi:10.1109/rtas.2018.00028

Cited by 63 publications

(34 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While the well-defined parallel model and the simple kernel software make attractive the use of traditional WCET analysis [42], recent studies showed how GPUs hide many details that can negatively affect the execution time [2] and that it is necessary to develop dedicated WCET analyses [28]. In the opposite view, other works noticed a substantial improvement in the real-time capability of the heterogeneous solution [69] [67]. A hybrid static and measurement-based solution to WCET estimation for GPU N N N 2 CPU N Y N 3 CPU N N Y 4 GPU N N N 5 GPU N Y N 6 CPU Y N N 7 CPU Y Y N 8 CPU Y N Y was proposed by Betts et al [8] in 2013, while the first pWCET approach was presented in 2014 by Berezovskyi et al [6] and its extension [5] in 2016.…”

Section: Heterogeneous Hardware and Predictabilitymentioning

confidence: 99%

Timing Predictability in High-Performance Computing With Probabilistic Real-Time

2020

View full text Add to dashboard Cite

Application requirements in High-Performance Computing (HPC) are becoming increasingly exacting, and the demand for computational resources is rising. In parallel, new application domains are emerging, as well as additional requirements, such as meeting real-time constraints. This requirement, typical of embedded systems, is difficult to guarantee when dealing with HPC infrastructures, due to the intrinsic complexity of the system. Traditional embedded systems static analyses to estimate the Worst-Case Execution Time (WCET) are not applicable to HPC, because modeling and analyzing all the system's hardware and software components is not practical. Measurement-based probabilistic analyses for the WCET emerged in the last decade to overcome these issues, but it requires the system to satisfy certain conditions to estimate a correct and safe WCET. In this work, we show the emerging application timing requirements, and we propose to exploit the probabilistic real-time theory to achieve the required time predictability. After a brief recap of the fundamentals of this methodology, we focus on its applicability to HPC systems, to check their ability to satisfy such conditions. In particular, we studied the advantages of having heterogeneous processors in HPC nodes and how resource management affects the applicability of the proposed technique.

show abstract

Section: Heterogeneous Hardware and Predictabilitymentioning

confidence: 99%

Timing Predictability in High-Performance Computing With Probabilistic Real-Time

2020

View full text Add to dashboard Cite

show abstract

“…Due to increased interest in GPU for accelerating parallel real-time applications, many real-time scheduling frameworks for GPU have been proposed in recent years [27,45,21,37], with a particular focus on DNN acceleration [76,69]. We first review works concerned with kernel scheduling, leaving more directly-related frameworks focusing on memory management to Section 2.3.3.…”

Section: Real-time Framework For Gpumentioning

confidence: 99%

“…We further assume that only one GPU kernel is executed at a time. While recent work has shown that co-scheduling multiple kernels can improve GPU resource utilization [76,39], it also complicates the issue of timing analysis. For this reason, we reserve such an extension to future work.…”

Section: System Model and Assumptionsmentioning

confidence: 99%

Dynamic Memory Bandwidth Allocation for Real-Time GPU-Based SoC Platforms

Aghilinasab

Ali

Yun

et al. 2020

IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.

View full text Add to dashboard Cite

show abstract

“…The operation-to-device partitioning does not require modifications to the TensorFlow internals, and it can be performed with the default Python API for TensorFlow. Zhou et al 17 proposed a pipeline scheduling solution aimed at optimizing the execution of DNN workload on GPUs, while Yang et al 18 identified a combination of techniques to support multiple cameras with an improved throughput in the context of automated-driving systems. In the context of mobile devices, Lane et al 19 proposed two runtime algorithms to decompose a DNN model across available processors with the purpose of improving performance and energy-efficiency.…”

Section: Scheduling Of Dnnmentioning

confidence: 99%

Timing isolation and improved scheduling of deep neural networks for real‐time systems

2020

View full text Add to dashboard Cite

In recent years, the performance of deep neural networks (DNNs) is significantly improved, making them suitable for many application fields, such as autonomous driving, advanced robotics, and industrial control. Despite a lot of research being devoted to improving the accuracy of DNNs, only limited efforts have been spent to enhance their timing predictability, required in several real-time applications. This paper proposes a software infrastructure based on the Linux operating system to integrate DNNs within a real-time multicore system. It has been realized by modifying both the internal scheduler of the popular TensorFlow framework and the SCHED_DEADLINE scheduling class of Linux. The proposed infrastructure allows providing timing isolation of DNN inference tasks, hence improving the determinism of the temporal interference generated by TensorFlow. The proposal is finally evaluated with a case study derived from a state-of-the-art benchmark inspired by an autonomous industrial system. Extensive experiments demonstrate the effectiveness of the proposed solution and show a significant reduction of both average and longest-observed response times of TensorFlow tasks.

show abstract

S^3DNN: Supervised Streaming and Scheduling for GPU-Accelerated Real-Time DNN Workloads

Cited by 63 publications

References 28 publications

Timing Predictability in High-Performance Computing With Probabilistic Real-Time

Timing Predictability in High-Performance Computing With Probabilistic Real-Time

Dynamic Memory Bandwidth Allocation for Real-Time GPU-Based SoC Platforms

Timing isolation and improved scheduling of deep neural networks for real‐time systems

Contact Info

Product

Resources

About