2021
DOI: 10.48550/arxiv.2106.14995
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Leveraging GPU batching for scalable nonlinear programming through massive Lagrangian decomposition

Abstract: We present the implementation of a trust-region Newton algorithm ExaTron for bound-constrained nonlinear programming problems, fully running on multiple GPUs. Without data transfers between CPU and GPU, our implementation has achieved the elimination of a major performance bottleneck under a memory-bound situation, particularly when solving many small problems in batch. We discuss the design principles and implementation details for our kernel function and core operations. Different design choices are justifie… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
1
1

Relationship

2
3

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 36 publications
0
5
0
Order By: Relevance
“…[239], [260] software-managed streaming memories [58], [59], [241] Irregular applications control divergence [19], [98], [139], [211], [211], [220], [261]- [266] continued on next page Scope, domain application-oriented (graph, data-flow analysis, N-body simulation and graphics rendering) [178]- [188] general purpose [87] general asynchronous and loosely synchronous [33], [81], [98] The regular affine loop index [81], [131] Applicationspecific Solutions, tuned libraries Biology [102] sparse matrix-matrix computations [268] Kinetic Fluid Simulation [269] nonlinear programming [270], [271] Robotics [178]- [181], [272] Deep neural network, tensor signal processing [131], [260] Figure 4 shows the distribution of the selected publications over the years. While publications were carefully selected based on relevance, time distribution provides a good indicator of the progress of SIMD hardware and the associated programming model.…”
Section: Categorymentioning
confidence: 99%
“…[239], [260] software-managed streaming memories [58], [59], [241] Irregular applications control divergence [19], [98], [139], [211], [211], [220], [261]- [266] continued on next page Scope, domain application-oriented (graph, data-flow analysis, N-body simulation and graphics rendering) [178]- [188] general purpose [87] general asynchronous and loosely synchronous [33], [81], [98] The regular affine loop index [81], [131] Applicationspecific Solutions, tuned libraries Biology [102] sparse matrix-matrix computations [268] Kinetic Fluid Simulation [269] nonlinear programming [270], [271] Robotics [178]- [181], [272] Deep neural network, tensor signal processing [131], [260] Figure 4 shows the distribution of the selected publications over the years. While publications were carefully selected based on relevance, time distribution provides a good indicator of the progress of SIMD hardware and the associated programming model.…”
Section: Categorymentioning
confidence: 99%
“…We consider a component-based decomposition of ACOPF [8], [13] that can be efficiently solved by ADMM, where each component in the network (i.e., buses, lines, generators) form their own subproblems. Although regionbased ADMM decompositions [10], [23] are also popular for power systems applications and result in fewer subproblems, the advantage of the component-based formulation is that each subproblem is small and can be solved efficiently, lending itself well to HPC implementations [5]. Furthermore, component-based decompositions do not require making partitioning decisions, which can impact performance.…”
Section: Component-based Decomposition Of Acopfmentioning
confidence: 99%
“…(i,j)∈L gi∈Gi and x = [(p gi(i) , q gi(i) , p ij(i) , q ij(i) , p ji(i) , q ji(i) , w i , θ i )] i∈B with proper A, B and c = 0, we have consensus constraints of the form in (2). Applying ADMM to the reformulation permits massively parallel computations that can be accelerated using GPUs [5].…”
Section: B Admm Formulation For Acopfmentioning
confidence: 99%
See 1 more Smart Citation
“…Thus, implementing a sparse direct solver on the GPU is nontrivial, and the perfor-mance of current GPU-based sparse linear solvers lags far behind that of their CPU equivalents [36,37]. Previous attempts to solve nonlinear problems on the GPU have circumvented this problem by relying on iterative solvers [7,33] or on decomposition methods [23]. Here, we have chosen instead to revisit the original reduced-space algorithm proposed in [10]: this method condenses the KKT system into a dense matrix, whose size is small enough to be factorized efficiently on the GPU with dense direct linear algebra.…”
Section: Introductionmentioning
confidence: 99%