Praveen Yedlapalli scite author profile

Praveen Yedlapalli

5Publications

51Citation Statements Received

56Citation Statements Given

How they've been cited

How they cite others

149

Affiliations

Kitware (United States), Pennsylvania State University

Publications

Order By: Most citations

Short-Circuiting Memory Traffic in Handheld Platforms

Yedlapalli

Nachiappan

Soundararajan

et al. 2014

View full text Add to dashboard Cite

Handheld devices are ubiquitous in today's world. With their advent, we also see a tremendous increase in deviceuser interactivity and real-time data processing needs. Media (audio/video/camera) and gaming use-cases are gaining substantial user attention and are defining product successes. The combination of increasing demand from these use-cases and having to run them at low power (from a battery) means that architects have to carefully study the applications and optimize the hardware and software stack together to gain significant optimizations. In this work, we study workloads from these domains and identify the memory subsystem (system agent) to be a critical bottleneck to performance scaling. We characterize the lifetime of the "frame-based" data used in these workloads through the system and show that, by communicating at frame granularity, we miss significant performance optimization opportunities, caused by large IP-to-IP data reuse distances. By carefully breaking these frames into sub-frames, while maintaining correctness, we demonstrate substantial gains with limited hardware requirements. Specifically, we evaluate two techniques, flow-buffering and IP-IP short-circuiting, and show that these techniques bring both power-performance benefits and enhanced user experience.

show abstract

Improving bank-level parallelism for irregular applications

Tang

Kandemir

Yedlapalli

et al. 2016

View full text Add to dashboard Cite

Domain knowledge based energy management in handhelds

Nachiappan

Yedlapalli

Soundararajan³

et al. 2015

View full text Add to dashboard Cite

Trading cache hit rate for memory performance

Ding

Kandemir

Guttman

et al. 2014

View full text Add to dashboard Cite

Most of the prior compiler based data locality optimization works target exclusively cache locality optimization, and row-buffer locality in DRAM banks received much less attention. In particular, to the best of our knowledge, there is no single compiler based approach that can improve row-buffer locality in executing irregular applications. This presents a critical problem considering the fact that executing irregular applications in a power and performance efficient manner will be a key requirement to extract maximum benefits from emerging multicore machines and exascale systems. Motivated by these observations, this paper makes the following contributions. First, it presents a compiler-runtime cooperative data layout optimization approach that takes as input an irregular program that has already been optimized for cache locality and generates an output code with the same cache performance but better row-buffer locality (lower number of row-buffer misses). Second, it discusses a more aggressive strategy that sacrifices some cache performance in order to further improve row-buffer performance (i.e., it trades cache performance for memory system performance). The ultimate goal of this strategy is to find the right tradeoff point between cache performance and rowbuffer performance so that the overall application performance is improved. Third, the paper performs a detailed evaluation of these two approaches using both an AMD Opteron based multicore system and a multicore simulator. The experimental results, collected using five real-world irregular applications, show that (i) conventional cache optimizations do not improve row-buffer locality significantly; (ii) our first approach achieves about 9.8% execution time improvement by keeping the number of cache misses the same as a cache-optimized code but reducing the number of row-buffer misses; and (iii) our second approach achieves even higher execution time improvements (13.8% on average) by sacrificing cache performance for additional memory performance.

show abstract

Locality-aware mapping and scheduling for multicores

Ding

Kandemir

Yedlapalli

et al. 2013

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.