This paper makes evident that a rigorous review of simulation methods for thermoelectric heat pumps in nearly-zero energy buildings is needed, as incoherent results during verification and validation of simulation models are reported in the literature. Statistical methods based on uncertainty analysis are deployed to calculate the minimum deviations between experimental and simulated values of the main variables that define the performance of a thermoelectric heat pump, within working scenarios expected in nearly-zero energy buildings. Results indicate that the narrower confidence intervals of these deviations are set by the uncertainties in the calculation of the thermoelecric properties of the thermoelectric modules. The minimum deviation in the prediction of the electric power consumed by the thermoelectric heat pump is ±6% in all scenarios. Likewise, confidence intervals for the heat flow emitted to the hot reservoir range from ±8% for high operating voltages of the thermoelectric heat pump to ±23% for low ones. In similar terms, those of the coefficient of performance range from ±4% to ±21%. These lower limits cannot be reduced unless the uncertainties in the measurement of the thermoelectric properties are reduced. In fact, these confidence intervals are due to increase as more uncertainties are added in the analysis, so wider intervals are expected when heat exchangers and complex heat reservoir are introduced in the system. To avoid so, several guidelines for uncertainty reduction are included in the paper, intended to increase the reliability of the simulation of thermoelectric heat pumps. Among them, relevant is the precise account of the aspect ratio in a thermoelectric module, as well as the deployment of temperature and voltage sensors with systematic standard uncertainties lower than 0.3ºC and 0.01V respectively. The paper demonstrates the relevance of uncertainty propagation analysis in the verification and validation of the simulation models in this field, and underlines how misleading could be just to compare average values of experimental and simulated results.