Efficient Dynamic Pinning of Parallelized Applications by Distributed Reinforcement Learning

Chasparis, Georgios C.; Rossbory, Michael

doi:10.1007/s10766-017-0541-y

Cited by 5 publications

(11 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This work extends prior work of the authors [8] in two directions: (a) we introduce a new type of reinforcement-learning dynamics that admits faster adjustment towards better allocations, and (b) evaluation is performed over a real-world application, that is a parallelized implementation of the Ant-Colony Optimization metaheuristic.…”

Section: Introductionmentioning

confidence: 78%

“…In comparison to [8], the difference lies in the reinforcement direction. As Equation (4) dictates, the strategy vector is only adjusted when a performance is higher than the running-average performanceū i , which provides a faster adjustment towards better assignments.…”

Section: Each Thread Is Assigned a Performance Index That Coincides Wmentioning

confidence: 87%

“…The convergence properties of this class of dynamics can be derived following the exact same reasoning used for the learning dynamics presented in [8]. In fact, it can be shown that the dynamics approach asymptotically a set of allocations which includes the solutions of the centralized optimization (1).…”

Section: Each Thread Is Assigned a Performance Index That Coincides Wmentioning

confidence: 98%

See 2 more Smart Citations

Efficient Dynamic Pinning of Parallelized Applications by Reinforcement Learning with Applications

Chasparis

Rossbory

Janjić

2017

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

Abstract. This paper introduces a resource allocation framework specifically tailored for addressing the problem of dynamic placement (or pinning) of parallelized applications to processing units. Decisions are updated recursively for each thread by a resource manager/scheduler which runs in parallel to the application's threads and periodically records their performances and assigns to them new CPU affinities. For updating the CPU-affinities, the scheduler uses a reinforcement-learning algorithm, each branch of which is responsible for assigning a new placement strategy to each thread. The proposed resource allocation framework is flexible enough to address alternative optimization criteria, such as maximum average processing speed and minimum speed variance among threads. We demonstrate the response of the dynamic scheduler under fixed and varying availability of resources (e.g., when other applications running on the same platform) in a parallel implementation of the Ant-Colony Optimization.

show abstract

Section: Introductionmentioning

confidence: 78%

Section: Each Thread Is Assigned a Performance Index That Coincides Wmentioning

confidence: 87%

Section: Each Thread Is Assigned a Performance Index That Coincides Wmentioning

confidence: 98%

See 1 more Smart Citation

Efficient Dynamic Pinning of Parallelized Applications by Reinforcement Learning with Applications

Chasparis

Rossbory

Janjić

2017

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

show abstract

“…Recognizing this need for both learning-and distributedbased optimization, and contrary to the aforementioned references on pinning of parallelized applications, our earlier work [6], [7] proposed a scheduling scheme for optimally allocating threads of a parallelized application that combines both a learning-and a distributed-based optimization. It requires a minimum information exchange, where only measurements collected from each running thread are needed.…”

Section: Related Work and Contributionsmentioning

confidence: 99%

“…In our previous work [6], [7], we have proposed a reinforcement-learning-based distributed scheduling framework (PaRLSched), adapted to Uniform Memory Architectures (UMA). In this paper, our goal is to provide a generalized methodology that also extends to Non-Uniform Memory Architectures (NUMA).…”

Section: Introductionmentioning

confidence: 99%

Learning-Based Dynamic Pinning of Parallelized Applications in Many-Core Systems

Chasparis

Rossbory

Janjić

et al. 2019

2019 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)

Self Cite

View full text Add to dashboard Cite

Motivated by the need for adaptive, secure and responsive scheduling in a great range of computing applications, including human-centered and time-critical applications, this paper proposes a scheduling framework that seamlessly adds resource-awareness to any parallel application. In particular, we introduce a learning-based framework for dynamic placement of parallel threads to Non-Uniform Memory Access (NUMA) architectures. Decisions are taken independently by each thread in a decentralized fashion that significantly reduces computational complexity. The advantage of the proposed learning scheme is the ability to easily incorporate any multi-objective criterion and easily adapt to performance variations during runtime. Under the multi-objective criterion of maximizing total completed instructions per second (i.e., both computational and memory-access instructions), we provide analytical guarantees with respect to the expected performance of the parallel application. We also compare the performance of the proposed scheme with the Linux operating system scheduler in an extensive set of applications, including both computationally and memory intensive ones. We have observed that performance improvement could be significant especially under limited availability of resources and under irregular memory-access patterns.

show abstract

LBMA and IMAR²: Weighted lottery based migration strategies for NUMA multiprocessing servers

Laso

Lorenzo

Rivera

et al. 2020

Concurrency and Computation

View full text Add to dashboard Cite

Summary Multicore NUMA systems present on‐board memory hierarchies and communication networks that influence performance when executing shared memory parallel codes. Characterizing this influence is complex, and understanding the effect of particular hardware configurations on different codes is of paramount importance. In this article, monitoring information extracted from hardware counters at runtime is used to characterize the behavior of each thread for an arbitrary number of multithreaded processes running in a multiprocessing environment. This characterization is given in terms of number of operations per second, operational intensity, and latency of memory accesses. We propose a runtime tool, executed in user space, that uses this information to guide two different thread migration strategies for improving execution efficiency by increasing locality and affinity without requiring any modification in the running codes. Different configurations of NAS Parallel OpenMP benchmarks running concurrently on multicore NUMA systems were used to validate the benefits of our proposal, in which up to four processes are running simultaneously. In more than the 95% of the executions of our tool, results outperform those of the operating system (OS) and produces up to 38% improvement in execution time over the OS for heterogeneous workloads, under different and realistic locality and affinity scenarios.

show abstract

Efficient Dynamic Pinning of Parallelized Applications by Distributed Reinforcement Learning

Cited by 5 publications

References 12 publications

Efficient Dynamic Pinning of Parallelized Applications by Reinforcement Learning with Applications

Efficient Dynamic Pinning of Parallelized Applications by Reinforcement Learning with Applications

Learning-Based Dynamic Pinning of Parallelized Applications in Many-Core Systems

LBMA and IMAR²: Weighted lottery based migration strategies for NUMA multiprocessing servers

Contact Info

Product

Resources

About

Efficient Dynamic Pinning of Parallelized Applications by Distributed Reinforcement Learning

Cited by 5 publications

References 12 publications

Efficient Dynamic Pinning of Parallelized Applications by Reinforcement Learning with Applications

Efficient Dynamic Pinning of Parallelized Applications by Reinforcement Learning with Applications

Learning-Based Dynamic Pinning of Parallelized Applications in Many-Core Systems

LBMA and IMAR2: Weighted lottery based migration strategies for NUMA multiprocessing servers

Contact Info

Product

Resources

About

LBMA and IMAR²: Weighted lottery based migration strategies for NUMA multiprocessing servers