Distributed calculation method for large-pixel-number holograms by decomposition of object and hologram planes

Jackin, Boaz Jessie; Matsubara, Hiroaki; Ohkawa, Takenao; Ootsu, Kanemitsu; Yokota, Takashi; Hayasaki, Yoshio; Yatagai, Toyohiko; Baba, Takanobu

doi:10.1364/ol.39.006867

Cited by 24 publications

(9 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The "Final addition" accumulates all the results for different layers. Figure 3 shows the process of hologram computation and 3D image reconstruction using the object decomposition method [5]. In the figure, the input object consists of two 2D object layers, a pigeon and an olive.…”

Section: Algorithm Adaptation For Gpu Cluster Implementationmentioning

confidence: 99%

See 1 more Smart Citation

Fast Computation with Efficient Object Data Distribution for Large-Scale Hologram Generation on a Multi-GPU Cluster

Baba

Watanabe

Jackin

et al. 2019

IEICE Trans. Inf. & Syst.

Self Cite

View full text Add to dashboard Cite

Takanobu BABA †a) , Fellow, Shinpei WATANABE † †b) , Boaz JESSIE JACKIN † † †c) , Nonmembers, Kanemitsu OOTSU † † † †d) , Takeshi OHKAWA † † † †e) , Takashi YOKOTA † † † †f) , Members, Yoshio HAYASAKI †g) , and Toyohiko YATAGAI †h) , Nonmembers SUMMARY The 3D holographic display has long been expected as a future human interface as it does not require users to wear special devices. However, its heavy computation requirement prevents the realization of such displays. A recent study says that objects and holograms with several giga-pixels should be processed in real time for the realization of high resolution and wide view angle. To this problem, first, we have adapted a conventional FFT algorithm to a GPU cluster environment in order to avoid heavy inter-node communications. Then, we have applied several single-node and multi-node optimization and parallelization techniques. The single-node optimizations include a change of the way of object decomposition, reduction of data transfer between the CPU and GPU, kernel integration, stream processing, and utilization of multiple GPUs within a node. The multi-node optimizations include distribution methods of object data from host node to the other nodes. Experimental results show that intra-node optimizations attain 11.52 times speed-up from the original single node code. Further, multi-node optimizations using 8 nodes, 2 GPUs per node, attain an execution time of 4.28 sec for generating a 1.6 giga-pixel hologram from a 3.2 giga-pixel object. It means a 237.92 times speed-up of the sequential processing by CPU and 41.78 times speed-up of multi-threaded execution on multicore-CPU, using a conventional FFTbased algorithm.

show abstract

Section: Algorithm Adaptation For Gpu Cluster Implementationmentioning

confidence: 99%

“…To this problem, we have been working on a distributed algorithm, called a decomposition method, for generating 2D Fourier holograms [5] and 3D Fresnel holograms [6]. In order to estimate the performance of the method, we per-formed a simulation.…”

Section: Introductionmentioning

confidence: 99%

Fast Computation with Efficient Object Data Distribution for Large-Scale Hologram Generation on a Multi-GPU Cluster

Baba

Watanabe

Jackin

et al. 2019

IEICE Trans. Inf. & Syst.

Self Cite

View full text Add to dashboard Cite

show abstract

“…The final Layer-add module adds the results for multiple layers. Fig.3 shows the process of 3D image reconstruction using the object decomposition method [4], adapted to GPU clusters. The input object is first decomposed into sub-objects; the three operations of interpolation, FFT, and shift are applied to each subobject to produce a sub-CGH; and from each sub-CGH the original sub-object is reconstructed at the original position.…”

Section: Algorithm Adaptation For Gpu Cluster Implementationmentioning

confidence: 99%

“…Without this operation all the object segments are reconstructed at the center. For the theoretical background and detailed explanation of the decomposition method as well as the preliminary results of its application to the Fresnel hologram generation, see [4] and [14], respectively.…”

Section: Algorithm Adaptation For Gpu Cluster Implementationmentioning

confidence: 99%

“…To this problem, we have been working on a distributed algorithm for generating 2D Fourier holograms [4] and 3D Fresnel holograms [14] on a GPU cluster. Aiming at the ultimate goal of realizing 3D holographic display with high-resolution and wide view angle properties, this research shows how we resolve the difficulties of large-scale CGH generation on multi-GPU clusters by adapting the FFT-based algorithm to the clusters' environment and applying applicationoriented optimizations under the multicore-CPU and multi-GPU combined heterogeneous architecture.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Overcoming the difficulty of large-scale CGH generation on multi-GPU cluster

Baba

Watanabe

Jackin³

et al. 2018

Proceedings of the 11th Workshop on General Purpose GPUs

Self Cite

View full text Add to dashboard Cite

The 3D holographic display has long been expected as a future human interface as it does not require users to wear special devices. However, its heavy computation requirement prevents the realization of such displays. A recent study says that objects and holograms with several giga-pixels should be processed in real time for the realization of high resolution and wide view angle. To this problem, first, we have adapted a conventional FFT algorithm to a GPU cluster environment in order to avoid heavy inter-node communications. Then, we have applied several single-node and multi-node optimization and parallelization techniques. The single-node optimizations include the change of the way of object decomposition, reduction of data transfer between CPU and GPU, kernel integration, stream processing, and utilization of multi-GPU within a node. The multi-node optimizations include distribution methods of object data from host node to the other nodes. The experimental results show that the intra-node optimizations attain 11.52 times speed-up from the original single node code. Further, multi-node optimizations using 8 nodes, 2 GPUs per node, attain the execution time of 4.28 sec. for generating 1.6 gigapixel hologram from 3.2 giga-pixel object. It means 237.92 times speed-up of the sequential processing by CPU using a conventional FFT-based algorithm. CCS Concepts • Computer systems organization → Heterogeneous (hybrid) systems; • Human-centered computing → Displays and imagers; • Software and its engineering → Parallel programming languages; • Applied computing → Physical sciences and engineering;

show abstract

The High Performance Computing for 3D Dynamic Holographic Simulation Based on Multi-GPU Cluster

Zhang¹,

Lin²,

Guo³

2016

Theory, Methodology, Tools and Applications for Modeling and Simulation of Complex Systems

View full text Add to dashboard Cite

Distributed calculation method for large-pixel-number holograms by decomposition of object and hologram planes

Cited by 24 publications

References 6 publications

Fast Computation with Efficient Object Data Distribution for Large-Scale Hologram Generation on a Multi-GPU Cluster

Fast Computation with Efficient Object Data Distribution for Large-Scale Hologram Generation on a Multi-GPU Cluster

Overcoming the difficulty of large-scale CGH generation on multi-GPU cluster

The High Performance Computing for 3D Dynamic Holographic Simulation Based on Multi-GPU Cluster

Contact Info

Product

Resources

About