The efficient use of multicore architectures for sparse matrixvector multiplication (SpMV) is currently an open challenge. One algorithm which makes use of SpMV is the maximum likelihood expectation maximization (MLEM) algorithm. When using MLEM for positron emission tomography (PET) image reconstruction, one requires a particularly large matrix. We present a new storage scheme for this type of matrix which cuts the memory requirements by half, compared to the widelyused compressed sparse row format. For parallelization we combine the two partitioning techniques recursive bisection and striping. Our results show good load balancing and cache behavior. We also give speedup measurements on various modern multicore systems.
This paper compares various contemporary multicore-based microprocessor architectures from different vendors with different memory interconnects regarding performance, speedup, and parallel efficiency. Sparse matrix decomposition is used as a benchmark application. The example matrix used in the experiments comes from an electrical engineering application, where numerical simulation of physical processes plays an important role in the design of industrial products.Within this context, thread-to-core pinning and cache optimization are two important aspects which are investigated in more detail.
Even parts of a program that are sequential or just inherently difficult to parallelize can be optimized for ILP. For instance, eliminating loop overheads and potential pipeline stalls from control flow can alleviate performance bottlenecks. Unfortunately, static compilation is limited in the extent to which it can identify opportunities to apply such optimizations. Generating code dynamically at run time, however, create much more efficient applications by usin information not available at compile time. We demonstrate our approach on a sparse-matrix PET scan code by aggressive unrolling loops and specializing code via dynamic code generation. We leverage task-level parallelism by having an auxiliary processor core concurrently generate code and feed it to the core executing the application. Our approach to fast code generation leverages patching and concatenating prepared code skeletons.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.