We describe the implementation of an image reconstruction algorithm using the parallel processing capabilities of graphics processors. We are designing a new breast scanner which will allow simultaneous acquisition of PET and MRI images. The breast scanner is based on the technology of the much smaller RatCAP PET detector. The image reconstruction of the breast scanner poses a significant computing challenge, which we hope to alleviate with the processing power of modern GPUs. We describe the status of the reconstruction and discuss goals and future possible improvements.
I. HIGH-PERFORMANCE COMPUTING ON THE GPUA new trend is evolving in high-performance computing, which makes use of commodity graphics processors (GPUs) for massively parallel computing tasks. The development of GPUs, driven by high-end computer gaming, can easily be used for CPU-intensive tasks. Off-the-shelf systems providing multiple TeraFlops of processing power are available at commodity price levels today.We have investigated the "Compute Unified Device Architecture" (CUDA) [1] framework, which makes use of NVIDIA GPUs in either a CUDA-enabled graphics card, or in a dedicated GPU board. Even low-cost graphics cards provide a large number of individual "Multiprocessors", which in turn provide a high number of processing cores, allowing to run hundreds or thousands of parallel execution threads. At the time of this writing, the newest GPU board can run more than 7000 parallel threads. The CUDA software libraries and development kits are available for free.
II. IMAGE RECONSTRUCTION OF THE BREAST SCANNERThe BNL Breast Scanner (figure 1) [2] is a development derived from the smaller RatCAP detector [3], which is a head-mounted PET tomograph designed and built to image the brain of an awake rat. The breast scanner re-uses most of the RatCAP technology, such as the electronics and data readout, but is much larger than the RatCAP tomograph.The data of the tomographs are reconstructed using a MLEM reconstruction mechanism [4], which uses an iterative process. At the core of the algorithm is the system matrix A, which describes the probabilities that the decay photons originating from a particular voxel end up in a given pair of detector elements.The iterative approach adjusts the image data X until the estimated projections AX produce a close approximation of M. L. Purschke is with Brookhaven National Laboratory, Upton, NY 11973, USA S. S. Southekal and B. Ravindranath are with the Stony Brook University, Stony Brook, NY.Fig. 1. Conceptional design of the BNL breast scanner. In this design, the diameter of the tomograph can be adjusted.the measured data. Since MLEM converges slowly, over 1000 iterations are required for quantitative accuracy.In the course of the code execution, we calculate a forward projection which can, after some simplifications, be written asIn essence, this represents the measured counts per line of response (LOR) that would have been obtained with the activity distribution given in the current image candidate X n , where n deno...