2020
DOI: 10.1002/cpe.6018
|View full text |Cite
|
Sign up to set email alerts
|

Comparing unified, pinned, and host/device memory allocations for memory‐intensive workloads on Tegra SoC

Abstract: Summary Edge computing focuses on processing near the source of the data. Edge computing devices using the Tegra SoC architecture provide a physically distinct GPU memory architecture. In order to take advantage of this architecture, different modes of memory allocation need to be considered. Different GPU memory allocation techniques yield different results in memory usage and execution times of identical applications on Tegra devices. In this article, we implement several GPU application benchmarks, includin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(5 citation statements)
references
References 17 publications
0
5
0
Order By: Relevance
“…where a n (j) = (−1) (B−1−j)/(B−1) b n (j), • is the greatest integer function and j ∈ [0, B − 1]. From (22), it is clear that filter partial products a n (j) undergoes shiftaccumulation for B number of clock cycles with sign inversion at 0 th clock cycle. The term b n (j) can be expressed as…”
Section: A Inner Product Using Tc Damentioning
confidence: 99%
See 1 more Smart Citation
“…where a n (j) = (−1) (B−1−j)/(B−1) b n (j), • is the greatest integer function and j ∈ [0, B − 1]. From (22), it is clear that filter partial products a n (j) undergoes shiftaccumulation for B number of clock cycles with sign inversion at 0 th clock cycle. The term b n (j) can be expressed as…”
Section: A Inner Product Using Tc Damentioning
confidence: 99%
“…Due to the progressive scaling of silicon devices over the past several years, semiconductor memory has become inexpensive, high-speed and power-efficient. As per the projections of the international technology roadmap for semiconductors (ITRS) [21], embedded memories will continue to dominate in system-on-chip, for instance, at present, it is roughly more than 90% of total SoC content [22]. It is found that the packing density of transistors in SRAM is not only high but also increasing much faster than the transistor density of logic devices [23].…”
Section: Introductionmentioning
confidence: 99%
“…In Reference 8, the authors implement several GPU applications, including a custom CFD code with unified, pinned, and normal host/device memory allocation modes. They evaluate and compare the memory usage and execution time of such workloads on edge computing Tegra system‐on‐chips (SoC) equipped with integrated GPUs using a shared memory architecture, and non‐SoC machines with discrete GPUs equipped with distinct VRAM.…”
Section: Contents Of the Special Issuementioning
confidence: 99%
“…In [13], the authors perform a similar analysis comparing the sync and async_alloc models, focusing on latency hiding and its effect on runtime performance using two GPUs. The authors in [4] make a comparison between the different CUDA communication models, but on a Tegra SoC-based system where both host and device memory is shared. None of this work, however, looks at all models or at code generation for them.…”
Section: Related Workmentioning
confidence: 99%