Data Management on New Hardware 2022
DOI: 10.1145/3533737.3535089
|View full text |Cite
|
Sign up to set email alerts
|

To use or not to use the SIMD gather instruction?

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
1
1

Relationship

2
3

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 24 publications
0
5
0
Order By: Relevance
“…However, this guideline does not always hold, as we have experimentally shown in [21]. The outcome of our comprehensive evaluation was that SIMD registers can be populated with data elements from non-consecutive memory locations using GATHER with (almost) the same performance as with data elements from consecutive memory location using LOAD in single-threaded as well as multi-threaded environments.…”
Section: Bounce: Block Concurrent Simd Conceptmentioning
confidence: 93%
See 1 more Smart Citation
“…However, this guideline does not always hold, as we have experimentally shown in [21]. The outcome of our comprehensive evaluation was that SIMD registers can be populated with data elements from non-consecutive memory locations using GATHER with (almost) the same performance as with data elements from consecutive memory location using LOAD in single-threaded as well as multi-threaded environments.…”
Section: Bounce: Block Concurrent Simd Conceptmentioning
confidence: 93%
“…Thus, the performance will probably be worse compared to the state-of-the-art scaling SIMD approach. To overcome that, there are enough optimization knobs, hence we haven taken a closer look at one knob as an example, which we already evaluated in more detail in [21]. To be self-contained, we include a specific evaluation result in this article.…”
Section: Performance Of the Data Access Patternmentioning
confidence: 99%
“…Armejach et al [60] optimized stencil applications for SVE. Habich et al [194] proposed a block-striped data access pattern heavily depending on the Gather operation on GPUs to optimize the overhead of accessing non-consecutive memory locations. Parallelizing stencil applications on GPUs is strongly correlated to SIMD extensions [136], [173], [195]- [198].…”
Section: ) Stencil Applicationsmentioning
confidence: 99%
“…[283]-[285] Xeon Phi [19], [57], [72], [193], [202]- [204], [256], [267], [282], [286], [287], [287] Intel SSE family, AVX family [59], [115], [115], [151], [152], [174], [192], [194], [242], [255], [267], [282], [288] [201], [291], [292] metrics include speedup, scalability, and efficiency (ratio of achieved throughput to peak performance). Furthermore, some evaluations used bandwidth and cache-related performance counters, especially for memory-bounded applications.…”
Section: Target Platformmentioning
confidence: 99%
“…This article is an extended version of[14]. In particular, this article includes an extensive GATHER evaluation and an additional representative example from columnar database systems compared to[14].…”
mentioning
confidence: 99%