Cache-Aware GPU Optimization for Out-of-Core Cone Beam CT Reconstruction of High-Resolution Volumes

Lu, Yuechao; Ino, Fumihiko; Hagihara, Kenichi

doi:10.1587/transinf.2016edp7174

Cited by 11 publications

(21 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Future work is to deal with large problem sizes that cannot be naively stored in the GPU memory due to memory exhaustion. An out-of-core processing scheme [18], [19] may be useful for realizing this large-scale computation; the scheme divides data into small pieces and iteratively processes the pieces with overlapping GPU computation with CPU-GPU data transfer.…”

Section: Resultsmentioning

confidence: 99%

Accelerating the Held-Karp Algorithm for the Symmetric Traveling Salesman Problem

Kimura

Higa

Okita

et al. 2019

IEICE Trans. Inf. & Syst.

Self Cite

View full text Add to dashboard Cite

In this paper, we propose an acceleration method for the Held-Karp algorithm that solves the symmetric traveling salesman problem by dynamic programming. The proposed method achieves acceleration with two techniques. First, we locate data-independent subproblems so that the subproblems can be solved in parallel. Second, we reduce the number of subproblems by a meet in the middle (MITM) technique, which computes the optimal path from both clockwise and counterclockwise directions. We show theoretical analysis on the impact of MITM in terms of the time and space complexities. In experiments, we compared the proposed method with a previous method running on a single-core CPU. Experimental results show that the proposed method on an 8-core CPU was 9.5-10.5 times faster than the previous method on a single-core CPU. Moreover, the proposed method on a graphics processing unit (GPU) was 30-40 times faster than that on an 8-core CPU. As a side effect, the proposed method reduced the memory usage by 48%. key words: symmetric traveling salesman problem, Held-Karp algorithm, parallelization, meet in the middle, GPU

show abstract

Section: Resultsmentioning

confidence: 99%

Accelerating the Held-Karp Algorithm for the Symmetric Traveling Salesman Problem

Kimura

Higa

Okita

et al. 2019

IEICE Trans. Inf. & Syst.

Self Cite

View full text Add to dashboard Cite

show abstract

“…The next section discusses the execution of the filtering stage on the CPUs and the following section discusses the execution of the novel FDK algorithm (back-projection stage) on the GPUs. Figure 2 shows our heterogeneous computational flow, which is different from the typical method of using only the GPU for all the computation [10,38,46]. Utilizing the CPU to perform the filtering stage can be more efficient in comparison to using the GPUs to compute the entire FDK pipeline.…”

Section: Proposed Novel Fdk Algorithmmentioning

confidence: 99%

“…CPUs, GPUs, and Xeon Phi. To improve the data locality, the authors in [38,78] implemented the back-projection kernel on GPUs by organizing the loops as Algorithm 4, such that they compute along the z-axis first. However, that method does not optimize for the layout of arrays Q and I.…”

Section: Improving Data Locality: Data Layout and Loopsmentioning

confidence: 99%

iFDK

Chen

Wahib

Takizawa

et al. 2019

Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

View full text Add to dashboard Cite

Computed Tomography (CT) is a widely used technology that requires compute-intense algorithms for image reconstruction. We propose a novel back-projection algorithm that reduces the projection computation cost to 1/6 of the standard algorithm. We also propose an efficient implementation that takes advantage of the heterogeneity of GPU-accelerated systems by overlapping the filtering and back-projection stages on CPUs and GPUs, respectively. Finally, we propose a distributed framework for high-resolution image reconstruction on state-of-the-art GPU-accelerated supercomputers. The framework relies on an elaborate interleave of MPI collective communication steps to achieve scalable communication. Evaluation on a single Tesla V100 GPU demonstrates that our backprojection kernel performs up to 1.6× faster than the standard FDK implementation. We also demonstrate the scalability and instantaneous CT capability of the distributed framework by using up to 2,048 V100 GPUs to solve a 4K and 8K problems within 30 seconds and 2 minutes, respectively (including I/O).

show abstract

“…Zheng et al [22] presented a cache-aware memory-scheduling scheme for cone beam backprojection [23], [24], which is an inverse volume rendering problem. In other words, a series of 2-D projections are backprojected into 3-D space to reconstruct volume data.…”

Section: Related Workmentioning

confidence: 99%

Cache-Aware, In-Place Rotation Method for Texture-Based Volume Rendering

Misaki

Ino

Hagihara

2017

IEICE Trans. Inf. & Syst.

Self Cite

View full text Add to dashboard Cite

SUMMARYWe propose a cache-aware method to accelerate texturebased volume rendering on a graphics processing unit (GPU) that is compatible with the compute unified device architecture. The proposed method extends a previous method such that it can maximize the average rendering performance while rotating the viewing direction around a volume. To realize this, the proposed method performs in-place rotation of volume data, which rearranges the order of voxels to allow consecutive threads (warps) to refer to voxels with the minimum access strides. Experiments indicate that the proposed method replaces the worst texture cache (TC) hit rate of 42% with the best TC hit rate of 93% for a 1024 3 -voxel volume. Thus, the average frame rate increases by a factor of 1.6 in the proposed method compared with that in the previous method. Although the overhead of inplace rotation slightly decreases the frame rate from 2.0 frames per second (fps) to 1.9 fps, this slowdown occurs only with a few viewing directions.

show abstract

Cache-Aware GPU Optimization for Out-of-Core Cone Beam CT Reconstruction of High-Resolution Volumes

Cited by 11 publications

References 24 publications

Accelerating the Held-Karp Algorithm for the Symmetric Traveling Salesman Problem

Accelerating the Held-Karp Algorithm for the Symmetric Traveling Salesman Problem

iFDK

Cache-Aware, In-Place Rotation Method for Texture-Based Volume Rendering

Contact Info

Product

Resources

About