Andreas Höfter scite author profile

Andreas Höfter

4Publications

74Citation Statements Received

146Citation Statements Given

How they've been cited

How they cite others

145

Affiliations

FH Aachen, RWTH Aachen University

Publications

Order By: Most citations

GPU-Accelerated Sparse Matrix-Matrix Multiplication by Iterative Row Merging

Gremse¹,

Höfter²,

Schwen³

et al. 2015

SIAM J. Sci. Comput.

View full text Add to dashboard Cite

We present an algorithm for general sparse matrix-matrix multiplication (SpGEMM) on many-core architectures, such as GPUs. SpGEMM is implemented by iterative row merging, similar to merge sort, except that elements with duplicate column indices are aggregated on the fly. The main kernel merges small numbers of sparse rows at once using subwarps of threads to realize an early compression effect which reduces the overhead of global memory accesses. The performance is compared with a parallel CPU implementation as well as with three GPU-based implementations. Measurements performed for computing the matrix square for 21 sparse matrices show that the proposed method consistently outperforms the other methods. Analysis showed that the performance is achieved by utilizing the compression effect and the GPU caching architecture. An improved performance was also found for computing Galerkin products which are required by algebraic multigrid solvers. The performance was particularly good for seven-point stencil matrices arising in the context of diffuse optical imaging and the improved performance allows one to perform image reconstruction at higher resolution using the same computational resources.

show abstract

GPU-accelerated adjoint algorithmic differentiation

Gremse

Höfter

Razik

et al. 2016

Computer Physics Communications

View full text Add to dashboard Cite

Many scientific problems such as classifier training or medical image reconstruction can be expressed as minimization of differentiable real-valued cost functions and solved with iterative gradient-based methods. Adjoint algorithmic differentiation (AAD) enables automated computation of gradients of such cost functions implemented as computer programs. To backpropagate adjoint derivatives, excessive memory is potentially required to store the intermediate partial derivatives on a dedicated data structure, referred to as the "tape". Parallelization is difficult because threads need to synchronize their accesses during taping and backpropagation. This situation is aggravated for many-core architectures, such as Graphics Processing Units (GPUs), because of the large number of light-weight threads and the limited memory size in general as well as per thread. We show how these limitations can be mediated if the cost function is expressed using GPU-accelerated vector and matrix operations which are recognized as intrinsic functions by our AAD software. We compare this approach with naive and vectorized implementations for CPUs. We use four increasingly complex cost functions to evaluate the performance with respect to memory consumption and gradient computation times. Using vectorization, CPU and GPU memory consumption could be substantially reduced compared to the naive reference implementation, in some cases even by an order of complexity. The vectorization allowed usage of optimized parallel libraries during forward and reverse passes which resulted in high speedups for the vectorized CPU version compared to the naive reference implementation. The GPU version achieved an additional speedup of 7.5 ± 4.4, showing that the processing power of GPUs can be utilized for AAD using this concept. Furthermore, we show how this software can be systematically extended for more complex problems such as nonlinear absorption reconstruction for fluorescence-mediated tomography.

show abstract

A Micro-computed Tomography Database and Reference Implementation for Parallel Computations of Trabecular Thickness and Spacing

Nguyen

Höfter²,

Leonardic³

et al. 2022

JORS

View full text Add to dashboard Cite

Intrinsic Respiratory Gating for Simultaneous Multi-Mouse μCT Imaging to Assess Liver Tumors

et al. 2022

View full text Add to dashboard Cite

Small animal micro computed tomography (μCT) is an important tool in cancer research and is used to quantify liver and lung tumors. A type of cancer that is intensively investigated with μCT is hepatocellular carcinoma (HCC). μCT scans acquire projections from different angles of the gantry which rotates X-ray source and detector around the animal. Motion of the animal causes inconsistencies between the projections which lead to artifacts in the resulting image. This is problematic in HCC research, where respiratory motion affects the image quality by causing hypodense intensity at the liver edge and smearing out small structures such as tumors. Dealing with respiratory motion is particularly difficult in a high throughput setting when multiple mice are scanned together and projection removal by retrospective respiratory gating may compromise image quality and dose efficiency. In mice, inhalation anesthesia leads to a regular respiration with short gasps and long phases of negligible motion. Using this effect and an iterative reconstruction which can cope with missing angles, we discard the relatively few projections in which the gasping motion occurs. Moreover, since gated acquisition, i.e., acquiring multiple projections from a single gantry angle is not a requirement, this method can be applied to existing scans. We applied our method in a high throughput setting in which four mice with HCC tumors were scanned simultaneously in a multi-mouse bed. To establish a ground truth, we manually selected projections with visible respiratory motion. Our automated intrinsic breathing projection selection achieved an accordance of 97% with manual selection. We reconstructed volumetric images and demonstrated that our intrinsic gating method significantly reduces the hypodense depiction at the cranial liver edge and improves the detectability of small tumors. Furthermore, we show that projection removal in a four mice scan discards only 7.5% more projections than in a single-mouse setting, i.e., four mouse scanning does not substantially compromise dose efficiency or image quality. To the best of our knowledge, no comparable method that combines multi-mouse scans for high throughput, intrinsic respiratory gating, and an available iterative reconstruction has been described for liver tumor imaging before.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Andreas Höfter

GPU-Accelerated Sparse Matrix-Matrix Multiplication by Iterative Row Merging

GPU-accelerated adjoint algorithmic differentiation

A Micro-computed Tomography Database and Reference Implementation for Parallel Computations of Trabecular Thickness and Spacing

Intrinsic Respiratory Gating for Simultaneous Multi-Mouse μCT Imaging to Assess Liver Tumors

Contact Info

Product

Resources

About