The predicted or designed envelope’s energy performance often differs from the in-reality achieved one and this phenomenon, known as the building fabric performance gap, is usually caused by poor workmanship or material properties differing from ones specified in the design phase. To bridge this gap, and quantify the achieved envelope’s thermal characteristics, different methodologies based on dedicated on-site monitoring campaigns are applied. One of the recognized methods for evaluating the building’s Heat Loss Coefficient (HLC) is the co-heating test, during which a vacated house is submitted to elevated temperatures while the delivered heat and the indoor and outdoor environments are precisely monitored. In recent years, researchers are turning towards non-intrusive methods and to validate them the co-heating test is often recognized as the benchmark. However, the validity of this approach is hard to prove since the reference is undefined, therefore, the aim of this work is to evaluate the reliability of the HLC estimation by the co-heating test. Since in this investigation actual reference values are required, such an assessment was performed by applying the co-heating testing procedure and established statistical models to artificially generated data which resemble ideal monitoring campaigns. In this work, the prescribed co-heating testing procedure and statistical models are applied to a simulated sample size of single-family dwellings characteristic of Flanders. The main aim here is to assess the trends in the results when different estimation approaches are applied to the same ideal monitoring dataset. Moreover, the analysis can be performed on a large sample size and, since indoor conditions are prescribed, outdoor conditions can be manipulated in a way to isolate the effect of different weather parameters on the estimation.