49th International Conference on Parallel Processing - ICPP 2020
DOI: 10.1145/3404397.3404413
|View full text |Cite
|
Sign up to set email alerts
|

Efficient Block Algorithms for Parallel Sparse Triangular Solve

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
7
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 21 publications
(7 citation statements)
references
References 67 publications
0
7
0
Order By: Relevance
“…Ultimately, for each GPU the accumulated communication time of each DAG level is the final communication time. The communication time on each level is estimated using the number of non-overlapped messages in GPU p (line [30][31][32][33]. The row reduction follows the same manner.…”
Section: Sptrsv Performance Model For Gpusmentioning
confidence: 99%
See 2 more Smart Citations
“…Ultimately, for each GPU the accumulated communication time of each DAG level is the final communication time. The communication time on each level is estimated using the number of non-overlapped messages in GPU p (line [30][31][32][33]. The row reduction follows the same manner.…”
Section: Sptrsv Performance Model For Gpusmentioning
confidence: 99%
“…Exploring high performance SpTRSV is becoming ever more crucial on GPU-accelerated architectures. Most existing parallel GPU triangular solvers focus on optimizing single GPU performance [6,[30][31][32][33]. Due to the complex data dependencies in SpTRSV, algorithm optimization has been mainly based on the level-set methods and color-set methods for various parallel architectures.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Then we pass the remote access to the node to reduce the number of interconnect communication. Then, for solve-update phase, we use similar method to collect all the system-wide le f t.sum to solve the component x and update the intermediate data locally for its dependents using hybrid memory system(line [28][29][30][31][32][33][34][35]. Note that this method still employs device-wide atomic operations to update the intermediate value as multiple updates from different warps of one PE may happen simultaneously.…”
Section: B Sptrsv Design With Nvshmemmentioning
confidence: 99%
“…Concurrent data structures are fundamental building-blocks for real-world applications. Existing works have proposed various novel data structures to handle the dependencies inside SpTRSV [4], [6]- [10], [34]. For better reusing the right-side-hands on Sunway architecture, Wang et al [4] tile the sparse matrix to control the data flow and explore inter-level parallel for SpTRSV.…”
Section: Related Workmentioning
confidence: 99%