Increasing quantities of renewable energy generation has yielded a need for greater energy storage capacity in power systems. Thermal storage in variable air volume (VAV) heating, ventilation, and air conditioning (HVAC) in commercial buildings has been identified as a possibly inexpensive source of grid storage, but the true costs are not known. Recent literature explores the inefficiency associated with providing grid services from these HVAC-based demand response resources by employing a battery analogy to calculate round-trip efficiency (RTE). Results vary significantly across studies, and in some cases reported efficiencies are strikingly low. This paper has three objectives to address these prior results. First, we synthesize and expand on insights in existing literature through systematically exploring the potential causes for the discrepancies in results. We reinforce previous work indicating baseline modeling may drive differences across studies, and deduce that control accuracy plays a role in the major differences between experiments and simulation. Second, we discuss why the RTE metric is problematic for demand response applications, discuss another proposed metric, additional energy consumption (AEC), and propose an extension, which we call uninstructed energy consumption (UEC), to evaluate demand response performance. Finally, we explore the merits of different metrics using experimental data and highlight UEC's reduced sensitivity to the characteristics of the demand response signal than previously proposed metrics.