We present an algorithm for general sparse matrix-matrix multiplication (SpGEMM) on many-core architectures, such as GPUs. SpGEMM is implemented by iterative row merging, similar to merge sort, except that elements with duplicate column indices are aggregated on the fly. The main kernel merges small numbers of sparse rows at once using subwarps of threads to realize an early compression effect which reduces the overhead of global memory accesses. The performance is compared with a parallel CPU implementation as well as with three GPU-based implementations. Measurements performed for computing the matrix square for 21 sparse matrices show that the proposed method consistently outperforms the other methods. Analysis showed that the performance is achieved by utilizing the compression effect and the GPU caching architecture. An improved performance was also found for computing Galerkin products which are required by algebraic multigrid solvers. The performance was particularly good for seven-point stencil matrices arising in the context of diffuse optical imaging and the improved performance allows one to perform image reconstruction at higher resolution using the same computational resources.
Many scientific problems such as classifier training or medical image reconstruction can be expressed as minimization of differentiable real-valued cost functions and solved with iterative gradient-based methods. Adjoint algorithmic differentiation (AAD) enables automated computation of gradients of such cost functions implemented as computer programs. To backpropagate adjoint derivatives, excessive memory is potentially required to store the intermediate partial derivatives on a dedicated data structure, referred to as the "tape". Parallelization is difficult because threads need to synchronize their accesses during taping and backpropagation. This situation is aggravated for many-core architectures, such as Graphics Processing Units (GPUs), because of the large number of light-weight threads and the limited memory size in general as well as per thread. We show how these limitations can be mediated if the cost function is expressed using GPU-accelerated vector and matrix operations which are recognized as intrinsic functions by our AAD software. We compare this approach with naive and vectorized implementations for CPUs. We use four increasingly complex cost functions to evaluate the performance with respect to memory consumption and gradient computation times. Using vectorization, CPU and GPU memory consumption could be substantially reduced compared to the naive reference implementation, in some cases even by an order of complexity. The vectorization allowed usage of optimized parallel libraries during forward and reverse passes which resulted in high speedups for the vectorized CPU version compared to the naive reference implementation. The GPU version achieved an additional speedup of 7.5 ± 4.4, showing that the processing power of GPUs can be utilized for AAD using this concept. Furthermore, we show how this software can be systematically extended for more complex problems such as nonlinear absorption reconstruction for fluorescence-mediated tomography.
Small animal micro computed tomography (μCT) is an important tool in cancer research and is used to quantify liver and lung tumors. A type of cancer that is intensively investigated with μCT is hepatocellular carcinoma (HCC). μCT scans acquire projections from different angles of the gantry which rotates X-ray source and detector around the animal. Motion of the animal causes inconsistencies between the projections which lead to artifacts in the resulting image. This is problematic in HCC research, where respiratory motion affects the image quality by causing hypodense intensity at the liver edge and smearing out small structures such as tumors. Dealing with respiratory motion is particularly difficult in a high throughput setting when multiple mice are scanned together and projection removal by retrospective respiratory gating may compromise image quality and dose efficiency. In mice, inhalation anesthesia leads to a regular respiration with short gasps and long phases of negligible motion. Using this effect and an iterative reconstruction which can cope with missing angles, we discard the relatively few projections in which the gasping motion occurs. Moreover, since gated acquisition, i.e., acquiring multiple projections from a single gantry angle is not a requirement, this method can be applied to existing scans. We applied our method in a high throughput setting in which four mice with HCC tumors were scanned simultaneously in a multi-mouse bed. To establish a ground truth, we manually selected projections with visible respiratory motion. Our automated intrinsic breathing projection selection achieved an accordance of 97% with manual selection. We reconstructed volumetric images and demonstrated that our intrinsic gating method significantly reduces the hypodense depiction at the cranial liver edge and improves the detectability of small tumors. Furthermore, we show that projection removal in a four mice scan discards only 7.5% more projections than in a single-mouse setting, i.e., four mouse scanning does not substantially compromise dose efficiency or image quality. To the best of our knowledge, no comparable method that combines multi-mouse scans for high throughput, intrinsic respiratory gating, and an available iterative reconstruction has been described for liver tumor imaging before.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.