2016
DOI: 10.1587/transinf.2016edp7174
|View full text |Cite
|
Sign up to set email alerts
|

Cache-Aware GPU Optimization for Out-of-Core Cone Beam CT Reconstruction of High-Resolution Volumes

Abstract: SUMMARYThis paper proposes a cache-aware optimization method to accelerate out-of-core cone beam computed tomography reconstruction on a graphics processing unit (GPU) device. Our proposed method extends a previous method by increasing the cache hit rate so as to speed up the reconstruction of high-resolution volumes that exceed the capacity of device memory. More specifically, our approach accelerates the well-known Feldkamp-Davis-Kress algorithm by utilizing the following three strategies: (1) a loop organiz… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
21
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
5
2

Relationship

3
4

Authors

Journals

citations
Cited by 11 publications
(21 citation statements)
references
References 24 publications
0
21
0
Order By: Relevance
“…Future work is to deal with large problem sizes that cannot be naively stored in the GPU memory due to memory exhaustion. An out-of-core processing scheme [18], [19] may be useful for realizing this large-scale computation; the scheme divides data into small pieces and iteratively processes the pieces with overlapping GPU computation with CPU-GPU data transfer.…”
Section: Resultsmentioning
confidence: 99%
“…Future work is to deal with large problem sizes that cannot be naively stored in the GPU memory due to memory exhaustion. An out-of-core processing scheme [18], [19] may be useful for realizing this large-scale computation; the scheme divides data into small pieces and iteratively processes the pieces with overlapping GPU computation with CPU-GPU data transfer.…”
Section: Resultsmentioning
confidence: 99%
“…The next section discusses the execution of the filtering stage on the CPUs and the following section discusses the execution of the novel FDK algorithm (back-projection stage) on the GPUs. Figure 2 shows our heterogeneous computational flow, which is different from the typical method of using only the GPU for all the computation [10,38,46]. Utilizing the CPU to perform the filtering stage can be more efficient in comparison to using the GPUs to compute the entire FDK pipeline.…”
Section: Proposed Novel Fdk Algorithmmentioning
confidence: 99%
“…CPUs, GPUs, and Xeon Phi. To improve the data locality, the authors in [38,78] implemented the back-projection kernel on GPUs by organizing the loops as Algorithm 4, such that they compute along the z-axis first. However, that method does not optimize for the layout of arrays Q and I.…”
Section: Improving Data Locality: Data Layout and Loopsmentioning
confidence: 99%
“…Zheng et al [22] presented a cache-aware memory-scheduling scheme for cone beam backprojection [23], [24], which is an inverse volume rendering problem. In other words, a series of 2-D projections are backprojected into 3-D space to reconstruct volume data.…”
Section: Related Workmentioning
confidence: 99%