Learning Driven Parallelization for Large-Scale Video Workload in Hybrid CPU-GPU Cluster

Zhang, Haitao; Tang, Bingchang; Geng, Xin; Ma, Huadóng

doi:10.1145/3225058.3225070

Cited by 5 publications

(3 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…When considering large-scale video workloads in hybrid CPU-GPU clusters, the performance degradation often comes from the uncertainty and variability of workloads, and the unbalanced use of heterogeneous resources. To accommodate this, Zhang et al [256] use two deep Q-networks to build a a two-level task scheduler, where the cluster-level scheduler selects proper execution nodes for mutually independent video tasks and the node-level scheduler assigns interrelated video subtasks to appropriate computing units.…”

Section: Data Center Managementmentioning

confidence: 99%

A Survey of Machine Learning for Computer Architecture and Systems

Wu,

Xie

2021

Preprint

View full text Add to dashboard Cite

It has been a long time that computer architecture and systems are optimized to enable efficient execution of machine learning (ML) algorithms or models. Now, it is time to reconsider the relationship between ML and systems, and let ML transform the way that computer architecture and systems are designed. This embraces a twofold meaning: the improvement of designers' productivity, and the completion of the virtuous cycle. In this paper, we present a comprehensive review of work that applies ML for system design, which can be grouped into two major categories, ML-based modelling that involves predictions of performance metrics or some other criteria of interest, and ML-based design methodology that directly leverages ML as the design tool. For ML-based modelling, we discuss existing studies based on their target level of system, ranging from the circuit level to the architecture/system level. For ML-based design methodology, we follow a bottom-up path to review current work, with a scope of (micro-)architecture design (memory, branch prediction, NoC), coordination between architecture/system and workload (resource allocation and management, data center management, and security), compiler, and design automation. We further provide a future vision of opportunities and potential directions, and envision that applying ML for computer architecture and systems would thrive in the community. CCS Concepts: • Computing methodologies → Machine learning; • Computer systems organization → Architectures; • General and reference → Surveys and overviews.

show abstract

Section: Data Center Managementmentioning

confidence: 99%

A Survey of Machine Learning for Computer Architecture and Systems

Wu,

Xie

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Applying ML for system design has a twofold meaning: 1 the reduction of burdens on human experts designing systems manually, and 2 the close of the positive feedback loop, i.e., architecture/system for ML and simultaneously ML for architecture/system, encouraging improvements on both sides. These applications include predictive performance modeling [18,35,44,45,52,56,90], efficient design space exploration [36,38,49,92], cache replacement [5,70,80], prefetcher [8,28,93], branch prediction [25,37], NoC design [21,63,85], power and resource management [4,31], task allocation [51,94], malware detection [15,59], compiler design [53,76], and so on.…”

Section: Related Workmentioning

confidence: 99%

Core Placement Optimization for Multi-chip Many-core Neural Network Systems with Reinforcement Learning

Deng

et al. 2020

ACM Trans. Des. Autom. Electron. Syst.

View full text Add to dashboard Cite

Multi-chip many-core neural network systems are capable of providing high parallelism benefited from decentralized execution, and they can be scaled to very large systems with reasonable fabrication costs. As multi-chip many-core systems scale up, communication latency related effects will take a more important portion in the system performance. While previous work mainly focuses on the core placement within a single chip, there are two principal issues still unresolved: the communication-related problems caused by the non-uniform, hierarchical on/off-chip communication capability in multi-chip systems, and the scalability of these heuristic-based approaches in a factorially growing search space. To this end, we propose a reinforcement-learning-based method to automatically optimize core placement through deep deterministic policy gradient, taking into account information of the environment by performing a series of trials (i.e., placements) and using convolutional neural networks to extract spatial features of different placements. Experimental results indicate that compared with a naive sequential placement, the proposed method achieves 1.99× increase in throughput and 50.5% reduction in latency; compared with the simulated annealing, an effective technique to approximate the global optima in an extremely large search space, our method improves the throughput by 1.22× and reduces the latency by 18.6%. We further demonstrate that our proposed method is capable to find optimal placements taking advantages of different communication properties caused by different system configurations, and work in a topology-agnostic manner. CCS Concepts: • Computer systems organization → Architectures; Parallel architectures; • Computing methodologies → Reinforcement learning;

show abstract

“…Recent work has also explored multi-level scheduling in hybrid CPU-GPU clusters. Zhang et al [84] proposed a deep reinforcement learning (DRL) framework to divide video workloads, first at the cluster level (selecting a worker node) and then at the node level (CPU vs GPU). The two DRL models act separately, but still work together to optimize overall throughput.…”

Section: Task Allocation and Resource Managementmentioning

confidence: 99%

A Survey of Machine Learning Applied to Computer Architecture Design

Penney¹,

Chen

2019

Preprint

View full text Add to dashboard Cite

Machine learning has enabled significant benefits in diverse fields, but, with a few exceptions, has had limited impact on computer architecture. Recent work, however, has explored broader applicability for design, optimization, and simulation. Notably, machine learning based strategies often surpass prior state-of-the-art analytical, heuristic, and human-expert approaches. This paper reviews machine learning applied system-wide to simulation and run-time optimization, and in many individual components, including memory systems, branch predictors, networks-on-chip, and GPUs. The paper further analyzes current practice to highlight useful design strategies and identify areas for future work, based on optimized implementation strategies, opportune extensions to existing work, and ambitious long term possibilities. Taken together, these strategies and techniques present a promising future for increasingly automated architectural design.

show abstract

Learning Driven Parallelization for Large-Scale Video Workload in Hybrid CPU-GPU Cluster

Cited by 5 publications

References 24 publications

A Survey of Machine Learning for Computer Architecture and Systems

A Survey of Machine Learning for Computer Architecture and Systems

Core Placement Optimization for Multi-chip Many-core Neural Network Systems with Reinforcement Learning

A Survey of Machine Learning Applied to Computer Architecture Design

Contact Info

Product

Resources

About