2D Total Variation Denoising (TVD) is a widely used technique for image denoising. It is also an important nonparametric regression method for estimating functions with heterogenous smoothness. Recent results have shown the TVD estimator to be nearly minimax rate optimal for the class of functions with bounded variation. In this paper, we complement these worst case guarantees by investigating the adaptivity of the TVD estimator to functions which are piecewise constant on axis aligned rectangles. We rigorously show that, when the truth is piecewise constant with few pieces, the ideally tuned TVD estimator performs better than in the worst case. We also study the issue of choosing the tuning parameter.In particular, we propose a fully data driven version of the TVD estimator which enjoys similar worst case risk guarantees as the ideally tuned TVD estimator.