It is widely known that large discrepancies in simulation results can exist between different BEMPs. The result is a lack of confidence in building simulation amongst many users and stakeholders. In the fields of building energy code development and energy labeling programs where building simulation plays a key role, there are also confusing and misleading claims that some BEMPs are better than others. In order to address these problems, it is essential to identify and understand differences between widely-used BEMPs, and the impact of these differences on load simulation results, by detailed comparisons of these BEMPs from source code to results.The primary goal of this work was to research methods and processes that would allow a thorough scientific comparison of the BEMPs. The secondary goal was to provide a list of strengths and weaknesses for each BEMP, based on in-depth understandings of their modeling capabilities, mathematical algorithms, advantages and limitations. This is to guide the use of BEMPs in the design and retrofit of buildings, especially to support China's building energy standard development and energy labeling program. The research findings could also serve as a good reference to improve the modeling capabilities and applications of the three BEMPs. The methodologies, processes, and analyses employed in the comparison work could also be used to compare other programs.The load calculation method of each program was analyzed and compared to identify the differences in solution algorithms, modeling assumptions and simplifications. Identifying inputs of each program and their default values or algorithms for load simulation was a critical step. These tend to be overlooked by users, but can lead to large discrepancies in simulation results. As weather data was an important input, weather file formats and weather variables used by each program were summarized. Some common mistakes in the weather data conversion process were discussed. ASHRAE Standard 140-2007 tests were carried out to test the fundamental modeling capabilities of the load calculations of the three BEMPs, where inputs for each test case were strictly defined and specified. The tests indicated that the cooling and heating load results of the three BEMPs fell mostly within the range of spread of results from other programs. Based on ASHRAE 140-2007 test results, the finer differences between DeST and EnergyPlus were further analyzed by designing and conducting additional tests. Potential key influencing factors (such as internal gains, air infiltration, convection coefficients of windows and opaque surfaces) were added one at a time to a simple base case with an analytical solution, to compare their relative impacts on 2 load calculation results.Finally, special tests were designed and conducted aiming to ascertain the potential limitations of each program to perform accurate load calculations. The heat balance module was tested for both single and double zone cases. Furthermore, cooling and heating load calculations were compared ...