Tilman Küstner scite author profile

The efficient use of multicore architectures for sparse matrixvector multiplication (SpMV) is currently an open challenge. One algorithm which makes use of SpMV is the maximum likelihood expectation maximization (MLEM) algorithm. When using MLEM for positron emission tomography (PET) image reconstruction, one requires a particularly large matrix. We present a new storage scheme for this type of matrix which cuts the memory requirements by half, compared to the widelyused compressed sparse row format. For parallelization we combine the two partitioning techniques recursive bisection and striping. Our results show good load balancing and cache behavior. We also give speedup measurements on various modern multicore systems.

show abstract

Sparse matrix operations on several multi-core architectures

Trinitis

Küstner

Weidendorfer

et al. 2010

J Supercomput

View full text Add to dashboard Cite

This paper compares various contemporary multicore-based microprocessor architectures from different vendors with different memory interconnects regarding performance, speedup, and parallel efficiency. Sparse matrix decomposition is used as a benchmark application. The example matrix used in the experiments comes from an electrical engineering application, where numerical simulation of physical processes plays an important role in the design of industrial products.Within this context, thread-to-core pinning and cache optimization are two important aspects which are investigated in more detail.

show abstract

Sparse Matrix Operations on Multi-core Architectures

Trinitis

Küstner

Weidendorfer

et al. 2009

View full text Add to dashboard Cite

Performance optimization by dynamic code transformation

Weidendorfer

Küstner

McKee

2011

View full text Add to dashboard Cite

Even parts of a program that are sequential or just inherently difficult to parallelize can be optimized for ILP. For instance, eliminating loop overheads and potential pipeline stalls from control flow can alleviate performance bottlenecks. Unfortunately, static compilation is limited in the extent to which it can identify opportunities to apply such optimizations. Generating code dynamically at run time, however, create much more efficient applications by usin information not available at compile time. We demonstrate our approach on a sparse-matrix PET scan code by aggressive unrolling loops and specializing code via dynamic code generation. We leverage task-level parallelism by having an auxiliary processor core concurrently generate code and feed it to the core executing the application. Our approach to fast code generation leverages patching and concatenating prepared code skeletons.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Tilman Küstner

Argument Controlled Profiling

Parallel MLEM on Multicore Architectures

Sparse matrix operations on several multi-core architectures

Sparse Matrix Operations on Multi-core Architectures

Performance optimization by dynamic code transformation

Contact Info

Product

Resources

About