Automated fault detection and diagnosis (AFDD) tools are used to identify degradation faults that reduce the performance and life of airconditioning equipment. A recent methodology has been developed to evaluate the performance of AFDD tools. The methodology involves feeding a library of input data to an AFDD protocol and categorizing the results. The current paper describes a study that has been conducted to assess the effect of using various input data sets in the evaluations. These input data sets include different distributions of fault type, fault intensity, and operating temperatures. Case study evaluations of three AFDD protocols in current widespread use are used to demonstrate the effects. The paper shows that evaluation results are sensitive to input data sets, and argues that data sets used in previously published studies should be improved to give higher fidelity evaluations. It concludes that for AFDD performance evaluation to be meaningful, the fault and operating conditions need to be controlled so that they connect to the anticipated deployment conditions. A related conclusion is that it is necessary to use simulation data, rather than laboratory measurement data, to conduct performance evaluation of AFDD.