We study Chebyshev filter diagonalization as a tool for the computation of many interior eigenvalues of very large sparse symmetric matrices. In this technique the subspace projection onto the target space of wanted eigenvectors is approximated with filter polynomials obtained from Chebyshev expansions of window functions. After the discussion of the conceptual foundations of Chebyshev filter diagonalization we analyze the impact of the choice of the damping kernel, search space size, and filter polynomial degree on the computational accuracy and effort, before we describe the necessary steps towards a parallel high-performance implementation. Because Chebyshev filter diagonalization avoids the need for matrix inversion it can deal with matrices and problem sizes that are presently not accessible with rational function methods based on direct or iterative linear solvers. To demonstrate the potential of Chebyshev filter diagonalization for large-scale problems of this kind we include as an example the computation of the 10 2 innermost eigenpairs of a topological insulator matrix with dimension 10 9 derived from quantum physics applications.
While many of the architectural details of future exascale-class high performance computer systems are still a matter of intense research, there appears to be a general consensus that they will be strongly heterogeneous, featuring "standard" as well as "accelerated" resources. Today, such resources are available as multicore processors, graphics processing units (GPUs), and other accelerators such as the Intel Xeon Phi. Any software infrastructure that claims usefulness for such environments must be able to meet their inherent challenges: massive multi-level parallelism, topology, asynchronicity, and abstraction. The "General, Hybrid, and Optimized Sparse Toolkit" (GHOST) is a collection of building blocks that targets algorithms dealing with sparse matrix representations on current and future large-scale systems. It implements the "MPI+X" paradigm, has a pure C interface, and provides hybrid-parallel numerical kernels, intelligent resource management, and truly heterogeneous parallelism for multicore CPUs, Nvidia GPUs, and the Intel Xeon Phi. We describe the details of its design with respect to the challenges posed by modern heterogeneous
Abstract-The Kernel Polynomial Method (KPM) is a wellestablished scheme in quantum physics and quantum chemistry to determine the eigenvalue density and spectral properties of large sparse matrices. In this work we demonstrate the high optimization potential and feasibility of peta-scale heterogeneous CPU-GPU implementations of the KPM. At the node level we show that it is possible to decouple the sparse matrix problem posed by KPM from main memory bandwidth both on CPU and GPU. To alleviate the effects of scattered data access we combine loosely coupled outer iterations with tightly coupled block sparse matrix multiple vector operations, which enables pure data streaming. All optimizations are guided by a performance analysis and modelling process that indicates how the computational bottlenecks change with each optimization step. Finally we use the optimized node-level KPM with a hybrid-parallel framework to perform large scale heterogeneous electronic structure calculations for novel topological materials on a petascale-class Cray XC30 system. Keywords-Parallel programming, Quantum mechanics, Performance analysis, Sparse matricesIt is widely accepted that future supercomputer architectures will change considerably compared to the machines used at present for large scale simulations. Extreme parallelism, use of heterogeneous compute devices and a steady decrease in the architectural balance in terms of main memory bandwidth vs. peak performance are important factors to consider when developing and implementing sustainable code structures. Accelerator-based systems already account for a performance share of 34% of the total TOP500 [1] today, and they may provide first blueprints of future architectural developments. The heterogeneous hardware structure typically calls for a completely new software development, in particular if the simultaneous use of all compute devices is addressed to maximize performance and energy efficiency.A prominent example demonstrating the need for new software implementations and structures is the MAGMA project [2]. In dense linear algebra the code balance (bytes/flop) of basic operations can often be reduced by blocking techniques to better match the machine balance. Thus, this community is expected to achieve high absolute performance also on future supercomputers. In contrast, sparse linear algebra is known for low sustained performance on state of the art homogeneous systems. The sparse matrix vector multiplication (SpMV) is often the performance-critical step.Most of the broad research on optimal SpMV data structures has been devoted to drive the balance of a general SpMV (not using any special matrix properties) down to its minimum value of 6 bytes/flop (double precision) or 2.5 bytes/flop (double complex) on all architectures, which is still at least an order of magnitude away from current machine balance numbers. Just recently the long known idea of applying the sparse matrix to multiple vectors at the same time (SpMMV) (see, e.g., [3]), to reduce computational balance has gai...
Block variants of the Jacobi-Davidson method for computing a few eigenpairs of a large sparse matrix are known to improve the robustness of the standard algorithm, but are generally shunned because the total number of floating-point operations increases. In this paper we present the implementation of a block Jacobi-Davidson solver. By detailed performance engineering and numerical experiments we demonstrate that the increase in operations is typically more than compensated by performance gains on modern architectures, giving a method that is both more efficient and robust than its single vector counterpart.
We consider the FEAST eigensolver, introduced by Polizzi in 2009 [5]. We describe an improvement concerning the reliability of the algorithm and discuss an application in the solution of eigenvalue problems from graphene modeling.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.