Ehsan Totoni scite author profile

The advent of petascale computing has introduced new challenges (e.g. heterogeneity, system failure) for programming scalable parallel applications. Increased complexity and dynamism in science and engineering applications of today have further exacerbated the situation. Addressing these challenges requires more emphasis on concepts that were previously of secondary importance, including migratability, adaptivity, and runtime system introspection. In this paper, we leverage our experience with these concepts to demonstrate their applicability and efficacy for real world applications. Using the CHARM++ parallel programming framework, we present details on how these concepts can lead to development of applications that scale irrespective of the rough landscape of supercomputing technology. Empirical evaluation presented in this paper spans many miniapplications and real applications executed on modern supercomputers including Blue Gene/Q, Cray XE6, and Stampede.

show abstract

Latte: a language, compiler, and runtime for elegant and efficient deep neural networks

Truong

Barik

Totoni

et al. 2016

SIGPLAN Not.

View full text Add to dashboard Cite

Deep neural networks (DNNs) have undergone a surge in popularity with consistent advances in the state of the art for tasks including image recognition, natural language processing, and speech recognition. The computationally expensive nature of these networks has led to the proliferation of implementations that sacrifice abstraction for high performance. In this paper, we present Latte, a domain-specific language for DNNs that provides a natural abstraction for specifying new layers without sacrificing performance. Users of Latte express DNNs as ensembles of neurons with connections between them. The Latte compiler synthesizes a program based on the user specification, applies a suite of domainspecific and general optimizations, and emits efficient machine code for heterogeneous architectures. Latte also includes a communication runtime for distributed memory data-parallelism. Using networks described using Latte, we demonstrate 3-6× speedup over Caffe (C++/MKL) on the three state-of-the-art ImageNet models executing on an Intel Xeon E5-2699 v3 x86 CPU.

show abstract

“Cool” Load Balancing for High Performance Computing Data Centers

Sarood

Miller

Totoni

et al. 2012

IEEE Trans. Comput.

View full text Add to dashboard Cite

As we move to exascale machines, both peak power demand and total energy consumption have become prominent challenges. A significant portion of that power and energy consumption is devoted to cooling, which we strive to minimize in this work. We propose a scheme based on a combination of limiting processor temperatures using dynamic voltage and frequency scaling (DVFS) and frequency-aware load balancing that reduces cooling energy consumption and prevents hot spot formation. Our approach is particularly designed for parallel applications, which are typically tightly coupled, and tries to minimize the timing penalty associated with temperature control. This paper describes results from experiments using five different CHARM++ and MPI applications with a range of power and utilization profiles. They were run on a 32-node (128-core) cluster with a dedicated air conditioning unit. The scheme is assessed based on three metrics: the ability to control processors' temperature and hence avoid hot spots, minimization of timing penalty, and cooling energy savings. Our results show cooling energy savings of up to 63 percent, with a timing penalty of only 2-23 percent.

show abstract

Energy-efficient computing for HPC workloads on heterogeneous manycore chips

Langer

Totoni

Palekar

et al. 2015

View full text Add to dashboard Cite

Power and energy efficiency is one of the major challenges to achieve exascale computing in the next several years. While chips operating at low voltages have been studied to be highly energy-efficient, low voltage operations lead to heterogeneity across cores within the microprocessor chip. In this work, we study chips with low voltage operation and discuss programming systems, and performance modeling in the presence of heterogeneity. We propose an integer linear programming based approach for selecting optimal configuration of a chip that minimizes its energy consumption. We obtain an average of 26% and 10.7% savings in energy consumption of the chip for two HPC mini-applications -miniMD and Jacobi, respectively. We also evaluate the energy savings with execution time constraints, using the proposed approach. These energy savings are significantly more than the savings by sub-optimal configurations obtained from heuristics.

show abstract

Comparing the power and performance of Intel's SCC to state-of-the-art CPUs and GPUs

Totoni

Behzad

Ghike

et al. 2012

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Ehsan Totoni

Parallel Programming with Migratable Objects: Charm++ in Practice

Latte: a language, compiler, and runtime for elegant and efficient deep neural networks

“Cool” Load Balancing for High Performance Computing Data Centers

Energy-efficient computing for HPC workloads on heterogeneous manycore chips

Comparing the power and performance of Intel's SCC to state-of-the-art CPUs and GPUs

Contact Info

Product

Resources

About