Leveraging GPU batching for scalable nonlinear programming through massive Lagrangian decomposition

Kim, Young-Dae; Pacaud, François; Kim, Ki-Baek; Anitescu, Mihai

doi:10.48550/arxiv.2106.14995

Cited by 5 publications

(5 citation statements)

References 36 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…[239], [260] software-managed streaming memories [58], [59], [241] Irregular applications control divergence [19], [98], [139], [211], [211], [220], [261]- [266] continued on next page Scope, domain application-oriented (graph, data-flow analysis, N-body simulation and graphics rendering) [178]- [188] general purpose [87] general asynchronous and loosely synchronous [33], [81], [98] The regular affine loop index [81], [131] Applicationspecific Solutions, tuned libraries Biology [102] sparse matrix-matrix computations [268] Kinetic Fluid Simulation [269] nonlinear programming [270], [271] Robotics [178]- [181], [272] Deep neural network, tensor signal processing [131], [260] Figure 4 shows the distribution of the selected publications over the years. While publications were carefully selected based on relevance, time distribution provides a good indicator of the progress of SIMD hardware and the associated programming model.…”

Section: Categorymentioning

confidence: 99%

MIMD Programs Execution Support on SIMD Machines: A Holistic Survey

Mustafa,

Alkhasawneh,

Obeidat

et al. 2024

IEEE Access

View full text Add to dashboard Cite

The Single Instruction Multiple Data (SIMD) architecture, supported by various highperformance computing platforms, efficiently utilizes data-level parallelism. The SIMD model is used in traditional CPUs, dedicated vector systems, and accelerators such as GPUs, vector extensions, and Xeon Phi. It provides performance throughput in computation-intensive and data-parallel applications. Despite the similarity of data-processing principles between these architectures, porting various programming models between the reviewed platforms is challenging. Furthermore, enhancing the programmability of these architectures is an important feature for utilizing their emerging computing power and simplifying programming complexity. This paper reviews the basic principles of optimization techniques to run asynchronous Multiple Instruction Multiple Data (MIMD) on SIMD accelerators. It also surveys several GPU programming paradigms and application programming interfaces (APIs) and classifies these frameworks into different groups based on their criteria. In addition, a review of studies that performed a comparison of the collaborative execution of GPUs with CPUs and Xeon Phi is presented in this paper. This study will be beneficial for developers and researchers in the field of computer architecture and parallel computing of intensive scientific applications, specifically for early-stage high-performance computing researchers, to obtain a brief overview of performance optimization opportunities as well as the challenges of existing SIMD platforms.

show abstract

Section: Categorymentioning

confidence: 99%

MIMD Programs Execution Support on SIMD Machines: A Holistic Survey

Mustafa,

Alkhasawneh,

Obeidat

et al. 2024

IEEE Access

View full text Add to dashboard Cite

show abstract

“…We consider a component-based decomposition of ACOPF [8], [13] that can be efficiently solved by ADMM, where each component in the network (i.e., buses, lines, generators) form their own subproblems. Although regionbased ADMM decompositions [10], [23] are also popular for power systems applications and result in fewer subproblems, the advantage of the component-based formulation is that each subproblem is small and can be solved efficiently, lending itself well to HPC implementations [5]. Furthermore, component-based decompositions do not require making partitioning decisions, which can impact performance.…”

Section: Component-based Decomposition Of Acopfmentioning

confidence: 99%

“…(i,j)∈L gi∈Gi and x = [(p gi(i) , q gi(i) , p ij(i) , q ij(i) , p ji(i) , q ji(i) , w i , θ i )] i∈B with proper A, B and c = 0, we have consensus constraints of the form in (2). Applying ADMM to the reformulation permits massively parallel computations that can be accelerated using GPUs [5].…”

Section: B Admm Formulation For Acopfmentioning

confidence: 99%

“…Consequently, there has been much interest from the power systems community in distributed computation methods, where large problems are partitioned into smaller problems that can be solved in parallel [3]. Distributed optimization can be used to either (1) physically spread computation across an electric network such that de-vices locally solve a small optimization problem and exchange solutions directly with neighboring devices until converging to the overall solution [4], or (2) partition large problems in the context of high-performance computing (HPC) [5].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A Reinforcement Learning Approach to Parameter Selection for Distributed Optimal Power Flow

Zeng¹,

Kody²,

Kim³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

With the increasing penetration of distributed energy resources, distributed optimization algorithms have attracted significant attention for power systems applications due to their potential for superior scalability, privacy, and robustness to a single point-of-failure. The Alternating Direction Method of Multipliers (ADMM) is a popular distributed optimization algorithm; however, its convergence performance is highly dependent on the selection of penalty parameters, which are usually chosen heuristically. In this work, we use reinforcement learning (RL) to develop an adaptive penalty parameter selection policy for the AC optimal power flow (ACOPF) problem solved via ADMM with the goal of minimizing the number of iterations until convergence. We train our RL policy using deep Q-learning, and show that this policy can result in significantly accelerated convergence (up to a 59% reduction in the number of iterations compared to existing, curvature-informed penalty parameter selection methods). Furthermore, we show that our RL policy demonstrates promise for generalizability, performing well under unseen loading schemes as well as under unseen losses of lines and generators (up to a 50% reduction in iterations). This work thus provides a proof-of-concept for using RL for parameter selection in ADMM for power systems applications.

show abstract

“…Thus, implementing a sparse direct solver on the GPU is nontrivial, and the perfor-mance of current GPU-based sparse linear solvers lags far behind that of their CPU equivalents [36,37]. Previous attempts to solve nonlinear problems on the GPU have circumvented this problem by relying on iterative solvers [7,33] or on decomposition methods [23]. Here, we have chosen instead to revisit the original reduced-space algorithm proposed in [10]: this method condenses the KKT system into a dense matrix, whose size is small enough to be factorized efficiently on the GPU with dense direct linear algebra.…”

Section: Introductionmentioning

confidence: 99%

Condensed interior-point methods: porting reduced-space approaches on GPU hardware

Pacaud¹,

Shin²,

Schanen³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

The interior-point method (IPM) has become the workhorse method for nonlinear programming. The performance of IPM is directly related to the linear solver employed to factorize the Karush-Kuhn-Tucker (KKT) system at each iteration of the algorithm. When solving large-scale nonlinear problems, state-of-the art IPM solvers rely on efficient sparse linear solvers to solve the KKT system. Instead, we propose a novel reduced-space IPM algorithm that condenses the KKT system into a dense matrix whose size is proportional to the number of degrees of freedom in the problem. Depending on where the reduction occurs we derive two variants of the reduced-space method: linearize-then-reduce and reduce-then-linearize. We adapt their workflow so that the vast majority of computations are accelerated on GPUs. We provide extensive numerical results on the optimal power flow problem, comparing our GPU-accelerated reduced space IPM with Knitro and a hybrid full space IPM algorithm. By evaluating the derivatives on the GPU and solving the KKT system on the CPU, the hybrid solution is already significantly faster than the CPU-only solutions. The two reduced-space algorithms go one step further by solving the KKT system entirely on the GPU. As expected, the performance of the two reduction algorithms depends intrinsically on the number of available degrees of freedom: their performance is poor when the problem has many degrees of freedom, but the two algorithms are up to 3 times faster than Knitro as soon as the relative number of degrees of freedom becomes smaller.Mihai Anitescu dedicates this work to the 70-th birthday of Florian Potra. Florian, thank you for the great contributions to optimization in general, and interior point methods in particular, and for initiating me and many others in them.

show abstract

Leveraging GPU batching for scalable nonlinear programming through massive Lagrangian decomposition

Cited by 5 publications

References 36 publications

MIMD Programs Execution Support on SIMD Machines: A Holistic Survey

MIMD Programs Execution Support on SIMD Machines: A Holistic Survey

A Reinforcement Learning Approach to Parameter Selection for Distributed Optimal Power Flow

Condensed interior-point methods: porting reduced-space approaches on GPU hardware

Contact Info

Product

Resources

About