Abstract-To account for the unique characteristics and limitations of the human visual system (HVS) when perceiving images, a variety of perceptual quality metrics have been proposed in the literature. Tailoring rate-distortion (RD) optimization for each metric is cumbersome and time-consuming. In this paper, we propose a general RD-optimization strategy called "transform domain bounding box" (BB) that can easily adapt to different quality metrics for JPEG-like block-based encoding of images. First, we define an objective function that is a weighted sum of the l0-norm of the transform coefficients (a proxy for rate) and distortion from the transform domain representation. Next, for a given distortion target τ , we define a don't care region (DCR) that specifies a search region of representations with distortion ≤ τ . We then show that the sparsest transform domain representation (lowest encoding rate) inside a BB that tightly contains the DCR can be constructed efficiently. Varying τ to induce different DCRs and corresponding BBs results in a set of constructed sparse representations of different sparsity counts, and the one that optimally trades off rate and distortion can be easily identified as solution to our objective. We show that our proposed BB strategy can be easily re-targeted for three common quality metrics: MSE, MSE-HVS-M and SSIM. Experimental results show that our BB strategy outperformed unoptimized JPEG compression by up to 1dB in PSNR when distortion metric is MSE, up to 2dB when metric is MSE-HVS-M, and up to 0.005 when metric is SSIM.