Abstract. The simulation of realistic medical ultrasound imaging is a computationally intensive task. Although this task may be divided and parallelized, temporal and spatial dependencies make memory bandwidth a bottleneck on performance. In this paper, we report on our implementation of an ultrasound simulator on the Cell Broadband Engine using the Westervelt equation. Our approach divides the simulation region into blocks, and then moves a block along with its surrounding blocks through a number of time steps without storing intermediate pressures to memory. Although this increases the amount of floating point computation, it reduces the bandwidth to memory over the entire simulation which improves overall performance. We also analyse how performance may be improved by restricting the simulation to regions that are affected by the transducer output pulse and that influence the final scattered signal received by the transducer.
Abstract-The realistic simulation of ultrasound wave propagation is computationally intensive. The large size of the grid and low degree of reuse of data means that it places a great demand on memory bandwidth. Graphics Processing Units (GPUs) have attracted attention for performing scientific calculations due to their potential for efficiently performing large numbers of floating point computations. However, many applications may be limited by memory bandwidth, especially for data sets whose size is larger than that of the GPU platform. This problem is only partially mitigated by applying the standard technique of breaking the grid into regions and overlapping the computation of one region with the host-device memory transfer of another.In this paper, we implement a memory-bound GPU-based ultrasound simulation and evaluate the use of a technique for improving performance by compressing the data into a fixedpoint representation that reduces the time required for interhost-device transfers. We demonstrate a speedup of 1.5 times on a simulation where the data is broken into regions that must be copied back and forth between the CPU and GPU. We develop a model that can be used to determine the amount of temporal blocking required to achieve near optimal performance, without extensive experimentation. This technique may also be applied to GPU-based scientific simulations in other domains such as computational fluid dynamics and electromagnetic wave simulation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.