Guaranteed numerical precision of each elementary step in a complex computation has been the mainstay of traditional computing systems for many years. This era, fueled by Moore's law and the constant exponential improvement in computing efficiency, is at its twilight: from tiny nodes of the Internet-of-Things, to large HPC computing centers, sub-picoJoule/operation energy efficiency is essential for practical realizations. To overcome the power wall, a shift from traditional computing paradigms is now mandatory. In this paper we present the driving motivations, roadmap, and expected impact of the European project OPRECOMP. OPRECOMP aims to (i) develop the first complete transprecision computing framework, (ii) apply it to a wide range of hardware platforms, from the sub-milliWatt up to the MegaWatt range, and (iii) demonstrate impact in a wide range of computational domains, spanning IoT, Big Data Analytics, Deep Learning, and HPC simulations. By combining together into a seamless design transprecision advances in devices, circuits, software tools, and algorithms, we expect to achieve major energy efficiency improvements, even when there is no freedom to relax end-to-end application quality of results. Indeed, OPRECOMP aims at demolishing the ultraconservative "precise" computing abstraction, replacing it with a more flexible and efficient one, namely transprecision computing.
The Determinant Quantum Monte Carlo (DQMC) method is one of the most powerful approaches for understanding properties of an important class of materials with strongly interacting electrons, including magnets and superconductors. It treats these interactions exactly, but the solution of a system of N electrons must be extrapolated to bulk values. Currently N ≈ 500 is state-of-the-art. Increasing N is required before DQMC can be used to model newly synthesized materials like functional multilayers.DQMC requires millions of linear algebra computations of order N matrices and scales as N 3 . DQMC cannot exploit parallel distributed memory computers efficiently due to limited scalability with the small matrix sizes and stringent procedures for numerical stability. Today, the combination of multisocket multicore processors and GPUs provides widely available platforms with new opportunities for DQMC parallelization.The kernel of DQMC, the calculation of the Green's function, involves long products of matrices. For numerical stability, these products must be computed using graded decompositions generated by the QR decomposition with column pivoting. The high communication overhead of pivoting limits parallel efficiency. In this paper, we propose a novel approach that exploits the progressive graded structure to reduce the communication costs of pivoting. We show that this method preserves the same numerical stability and achieves 70% performance of highly optimized DGEMM on a two-socket six-core Intel processor. We have integrated this new method and other parallelization techniques into QUEST, a modern DQMC simulation package. Using 36 hours on this Intel processor, we are able to compute accurately the magnetic properties and Fermi surface of a system of N = 1024 electrons. This simulation is almost an order of magnitude more difficult than N ≈ 500, owing to the N 3 scaling. This increase in system size will allow, for the first time, the computation of the magnetic and transport properties of layered materials with DQMC. In addition, we show preliminary results which further accelerate DQMC simulations by using GPU processors.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.