2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference On 2019
DOI: 10.1109/hpcc/smartcity/dss.2019.00038
|View full text |Cite
|
Sign up to set email alerts
|

Efficient Parallel Sort on AVX-512-Based Multi-Core and Many-Core Architectures

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
9
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 12 publications
(9 citation statements)
references
References 16 publications
0
9
0
Order By: Relevance
“…Usually, after sorting the values column-wise the matrix is transposed, so that the sorted column vectors become row vectors [4,5]. We avoid this transposition and start merging with vectorized Bitonic Merge networks on the sorted columns themselves.…”
Section: Sorting Networkmentioning
confidence: 99%
See 1 more Smart Citation
“…Usually, after sorting the values column-wise the matrix is transposed, so that the sorted column vectors become row vectors [4,5]. We avoid this transposition and start merging with vectorized Bitonic Merge networks on the sorted columns themselves.…”
Section: Sorting Networkmentioning
confidence: 99%
“…To execute the same modules using vectorized compare-and-exchange operations (coex) in a 4 × 2 matrix, we first swap the adjacent elements of the last two vectors (see Figure 2b left). In the second merge step the compare-and-exchange modules (1,3), (2,4), (5,7) and (6,8) are executed. In the vectorized version (Figure 2b center), we swap the adjacent elements of the second and fourth vectors before executing the two vectorized compare-and-exchange operations.…”
Section: Sorting Networkmentioning
confidence: 99%
“…Yin et al [26] described an efficient parallel sort on AVX-512-based multi-core and many-core architectures. Their approach achieves to sort 1.1 billion floats per second on an Intel KNL (AVX-512).…”
Section: Related Work On Vectorized Sorting Algorithmsmentioning
confidence: 99%
“…Yin et al [29] described an efficient parallel sort on AVX-512-based multicore and many-core architectures. Their approach achieves to sort 1.1 billion floats per second on an Intel KNL (AVX-512).…”
Section: Related Work On Vectorized Sorting Algorithmsmentioning
confidence: 99%