Modern Graphics Processing Units (GPUs) are well provisioned to support the concurrent execution of thousands of threads. Unfortunately, di erent bottlenecks during execution and heterogeneous application requirements create imbalances in utilization of resources in the cores. For example, when a GPU is bottlenecked by the available o -chip memory bandwidth, its computational resources are often overwhelmingly idle, waiting for data from memory to arrive.This work describes the Core-Assisted Bottleneck Acceleration (CABA) framework that employs idle on-chip resources to alleviate di erent bottlenecks in GPU execution. CABA provides exible mechanisms to automatically generate "assist warps" that execute on GPU cores to perform speci c tasks that can improve GPU performance and e ciency.CABA enables the use of idle computational units and pipelines to alleviate the memory bandwidth bottleneck, e.g., by using assist warps to perform data compression to transfer less data from memory. Conversely, the same framework can be employed to handle cases where the GPU is bottlenecked by the available computational units, in which case the memory pipelines are idle and can be used by CABA to speed up computation, e.g., by performing memoization using assist warps.We provide a comprehensive design and evaluation of CABA to perform e ective and exible data compression in the GPU memory hierarchy to alleviate the memory bandwidth bottleneck. Our extensive evaluations show that CABA, when used to implement data compression, provides an average performance improvement of 41.7% (as high as 2.6X) across a variety of memory-bandwidth-sensitive GPGPU applications.We believe that CABA is a exible framework that enables the use of idle resources to improve application performance with di erent optimizations and perform other useful tasks. We discuss how CABA can be used, for example, for memoization, prefetching, handling interrupts, pro ling, redundant multithreading, and speculative precomputation.