Kaushik Ravichandran scite author profile

2014

Software Transactional Memory (STM) systems provide an easy to use programming model for concurrent code and have been found suitable for parallelizing many applications providing performance gains with minimal programmer effort. With increasing core counts on modern processors one would expect increasing benefits. However, we observe that running STM applications on higher core counts is sometimes, in fact, detrimental to performance. This is due to the larger number of conflicts that arise with a larger number of parallel cores. As the number of cores available on processors steadily rise, a larger number of applications are beginning to exhibit these characteristics.In this paper we propose a novel dynamic concurrency control technique which can significantly improve performance (up to 50%) as well as resource utilization (up to 85%) for these applications at higher core counts. Our technique uses ideas borrowed from TCP's network congestion control algorithm and uses selfinduced concurrency fluctuations to dynamically monitor and match varying concurrency levels in applications while minimizing global synchronization. Our flux-based feedback-driven concurrency control technique is capable of fully recovering the performance of the best statically chosen concurrency specification (as chosen by an oracle) regardless of the initial specification for several real world applications. Further, our technique can actually improve upon the performance of the oracle chosen specification by more than 10% for certain applications through dynamic adaptation to available parallelism.We demonstrate our approach on the STAMP benchmark suite while reporting significant performance and resource utilization benefits. We also demonstrate significantly better performance when comparing against state of the art concurrency control and scheduling techniques. Further, our technique is programmer friendly as it requires no changes to application code and no offline phases.

Work Stealing for Multi-core HPC Clusters

Lee

2011

Today a significant fraction of HPC clusters are built from multi-core machines connected via a high speed interconnect, hence, they have a mix of shared memory and distributed memory. Work stealing algorithms are currently designed for either a shared memory architecture or for a distributed memory architecture and are extended to work on these multi-core clusters by assuming a single underlying architecture. However, as the number of cores in each node increase, the differences between a shared memory architecture and a distributed memory architecture become more acute. Current work stealing approaches are not suitable for multi-core clusters due to the dichotomy of the underlying architecture. We combine the best aspects of both the current approaches in to a new algorithm. Our algorithm allows for more efficient execution of large-scale HPC applications, such as UTS, on clusters which have large multi-cores. As the number of cores per node increase, which is inevitable given today's processor trends, such an approach is crucial.

PiMiCo: Privacy Preservation via Migration in Collaborative Mobile Clouds

Gavrilovska

2015

The proliferation of mobile devices and mobile clouds coupled with a multitude of their sensing abilities is creating interesting possibilities; the sensing capabilities are creating different types and fidelities of data in a geographically distributed manner that can be used to build new kinds of peer-topeer applications. However, the data generated by these mobile devices can be personal and of a highly confidential nature. While very interesting possibilities exist for collaborating on the diverse, shared data in real time, privacy policies on the data sharing, transport, as well as usage must be clearly specified and respected. The goal of this work is to introduce a privacy preserving data centric programming model for building collaborative applications in large scale mobile clouds and discuss its design.Our work introduces several concepts and leverages privacy annotations and a transparent execution migration framework to achieve our goals. We also present an evaluation using several applications demonstrating that overheads are minimal and can be used in a real-time setting.

DeSTM

Gavrilovska

2014

Pipemizer

Gakhar

Cahoon

et al. 2022

Proc. VLDB Endow.

We demonstrate Pipemizer , an optimizer and recommender aimed at improving the performance of queries or jobs in pipelines. These job pipelines are ubiquitous in modern data analytics due to jobs reading output files written by other jobs. Given that more than 650k jobs run on Microsoft's SCOPE job service per day and about 70% have inter-job dependencies, identifying optimization opportunities across query jobs is of considerable interest to both cluster operators and users. Pipemizer addresses this need by providing recommendations to users, allowing users to understand their system, and facilitating automated application of recommendations. Pipemizer introduces novel optimizations that include holistic pipeline-aware statistics generation, inter-job operator push-up, and job split & merge. This demonstration showcases optimizations and recommendations generated by Pipemizer , enabling users to understand and optimize job pipelines.