Modern high performance computer systems continue to increase in size and complexity. Tools to measure application performance in these increasingly complex environments must also increase the richness of their measurements to provide insights into the increasingly intricate ways in which software and hardware interact. PAPI (the Performance API) has provided consistent platform and operating system independent access to CPU hardware performance counters for nearly a decade. Recent trends toward massively parallel multi-core systems with often heterogeneous architectures present new challenges for the measurement of hardware performance information, which is now available not only on the CPU core itself, but scattered across the chip and system. We discuss the evolution of PAPI into Component PAPI, or PAPI-C, in which multiple sources of performance data can be measured simultaneously via a common software interface. Several examples of components and component data measurements are discussed. We explore the challenges to hardware performance measurement in existing multi-core architectures. We conclude with an exploration of future directions for the PAPI interface.
The two remarkable features of ternary values and a massive unit with thousands bits of parallel computation will make the ternary optical computer (TOC) with modified signed-digit (MSD) adder more powerful and efficient than ever before for numerical calculations. Based on the decrease-radix design presented previously, a TOC can satisfy either a user requiring huge capacity for data calculations or one with a moderate amount of data, if it is equipped with a prepared adder. Furthermore, with the application of pipelined operations and the proposed data editing technique, the efficiency of the prepared adder can be greatly improved, so that each calculated result can be obtained almost within one clock cycle. It is hopeful that by employing a MSD adder, users will be able to enter a new dimension with the creation of a new multiplier, new divider, as well as new matrix operator in a TOC in the near future.With the current rapid increase in the complexity of computer architectures, the power consumption of large scale systems has risen prohibitively. Much attention has been focused on reducing the power consumption in different ways. One of the ways of solving the problem is to use of an optical computer with its special non-electron characteristics of high speed, parallelism, multi-valued, and low power consumption. Considering these properties, researchers have been focusing mainly on improving the operating speed [1-3] and enlarging the number of parallel bits in these computers [4][5][6], but have often neglected the problem of reducing the power consumption.A TOC prototype recently developed in our laboratory at Shanghai University is a typical optical computer with a huge number of data bits [6,7]. Based on the decrease-radix design proposed in 2008 [8], we can configure any number of bits as specific groups of tri-valued logic units at any time in the TOC. However, as thousands of bits exist in an adder, the ripple-carry technique is infeasible in a TOC because of the terrible carry delay. In addition, the look-ahead carry technique does not suit the construction of optical elements due to the high complexity of its tree type architecture. For these reasons, we proposed a new technique called the direct parallel carry channel (DPCC) aiming at accelerating the carry operation [9]. Unfortunately, this scheme has failed to be put into practice for various reasons.
This paper describes the application of various search techniques to the problem of automatic empirical code optimization. The search process is a critical aspect of auto-tuning systems because the large size of the search space and the cost of evaluating the candidate implementations makes it infeasible to find the true optimum point by brute force. We evaluate the effectiveness of Nelder-Mead Simplex, Genetic Algorithms, Simulated Annealing, Particle Swarm Optimization, Orthogonal search, and Random search in terms of the performance of the best candidate found under varying time limits.
The PAPI project has defined and implemented a crossplatform interface to the hardware counters available on most modern microprocessors. The interface has gained widespread use and acceptance from hardware vendors, users, and tool developers. This paper reports on experiences with the community-based open-source effort to define the PAPI specification and implement it on a variety of platforms. Collaborations with tool developers who have incorporated support for PAPI are described. Issues related to interpretation and accuracy of hardware counter data and to the overheads of collecting this data are discussed. The paper concludes with implications for the design of the next version of PAPI.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.