Solving Sparse Triangular Linear Systems in Modern GPUs: A Synchronization-Free Algorithm

Dufrechou, Ernesto; Ezzatti, Pablo

doi:10.1109/pdp2018.2018.00034

Cited by 22 publications

(12 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We start the presentation of experimental results by comparing the baseline variant with the proposal of Liu et al, which is, to our best knowledge, the only publicly available implementation of the synchronization‐free strategy. This study is similar to that of our previous work, but this time the results correspond to executions performed in platform Pascal . Table confirms the results obtained in our previous work, showing that our proposal outperforms that of Liu et al for all the instances in our benchmark.…”

Section: Experimental Evaluationsupporting

confidence: 90%

“…This study is similar to that of our previous work, but this time the results correspond to executions performed in platform Pascal . Table confirms the results obtained in our previous work, showing that our proposal outperforms that of Liu et al for all the instances in our benchmark. For this reason, we leave the proposal by Liu et al out of the following experiments.…”

Section: Experimental Evaluationsupporting

confidence: 90%

“…In the other cases, the benefits in favor of our new routines range from 3× to 6×. These results are aligned with our previous experimental evaluation (see our previous work), where the cuSparse approach was superior for the nlpkkt160 , road_central , road_usa , and webbase‐1M matrices.…”

Section: Experimental Evaluationsupporting

confidence: 88%

“…We start this section revisiting our previous method to solve sparse triangular linear systems in the GPU, which follows a self ‐ scheduled approach, in contrast with the two‐phase strategies that involve an analysis of the sparse matrix as a preprocessing stage. This proposal is also synchronization ‐ free in the sense that no interaction with the CPU is needed until the end of the routine.…”

Section: Proposalmentioning

confidence: 99%

“…In previous work, we presented a self‐scheduled GPU algorithm for the sptrsv , using the CSR storage format, which does not require any preprocessing of the sparse matrix and avoids atomic operations, outperforming the state‐of‐the‐art implementations of this type of solvers. In this work, we extend our proposal introducing several optimizations, together with new routines that leverage the level‐set information to attack some of its weak spots, such as the delays caused by inactive thread‐blocks, and the under‐utilization of the GPU resources in the case of highly sparse matrices.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Using analysis information in the synchronization‐free GPU solution of sparse triangular systems

Dufrechou

Ezzatti

2019

Concurrency and Computation

Self Cite

View full text Add to dashboard Cite

The solution of sparse triangular linear systems is one of the most important building blocks for a large number of science and engineering problems. For these reasons, it has been studied steadily for several decades, principally in order to take advantage of emerging parallel platforms.In the context of massively parallel platforms such as GPUs, the standard strategy of parallel solution is based on performing a level-set analysis of the sparse matrix, and the kernel included in the NVIDIA CUSPARSE library is the most prominent example of this approach. However, a weak spot of this implementation is the costly analysis phase and the constant synchronizations with the CPU during the solution stage. In previous work, we presented a self-scheduled and synchronization-free GPU algorithm that avoided the analysis phase and the synchronizations of the standard approach. Here, we extend this proposal and show how the level-set information can be leveraged to improve its performance. In particular, we present new GPU solution routines that attack some of the weak spots of the self-scheduled solver, such as the under-utilization of the GPU resources in the case of highly sparse matrices. The experimental evaluation reveals a sensible runtime reduction over CUSPARSE and the state-of-the-art synchronization-free method. KEYWORDSgraphics processors (GPUs), level-set analysis, Sparse triangular linear systems, synchronization-free methods INTRODUCTIONMany essential sparse numerical linear algebra algorithms, such as the application of preconditioners based on incomplete factorizations, or the direct solution of sparse linear systems and least squares problems, imply the solution of sparse triangular linear systems as one of the most important building blocks. 1,2 This is the main motivation for the strong attention that this kernel has raised for many years, as well as the numerous efforts to develop efficient implementations for a variety of parallel platforms.This operation is challenging from the point of view of its parallelization as, in the general case, the elimination of one unknown (row) depends on the previous elimination of others. In addition, the triangular structure of the matrices can be the origin of load imbalance issues.There are two main approaches for the parallel solution of sparse triangular systems (the SPTRSV operation). On one side, we find two-stage methods, based on performing an analysis of the sparse matrix previous to the solution stage, to determine a scheduling for the elimination of the unknowns that reveals as much parallelism as possible. On the other side, there are one-stage methods, based on a self-scheduled pool of tasks, in which some of the tasks have to wait until the data necessary to perform their computations is made available by other tasks. The advantages of one paradigm or the other depends on the characteristics of the particular sparse matrix, so it results impossible to determine which is the best one. 3Graphics processors occupy a special place among the most important par...

show abstract

Section: Experimental Evaluationsupporting

confidence: 90%

Section: Experimental Evaluationsupporting

confidence: 90%

Section: Experimental Evaluationsupporting

confidence: 88%

Section: Proposalmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations