This paper presents a study about the prediction accuracy of daylight provision and overheating levels in dwellings when considering different methods (machine learning vs prediction formulas), training, and validation data sets. An existing high-rise building located in Tallinn, Estonia was considered to compare the best ML predictive method with novel prediction formulas. The quantification of daylight provision was conducted according to the European daylight standard EN 17037:2018 (based on minimum Daylight Factor (minDF)) and overheating level in terms of the degree-hour (DH) metric included in local regulations. The features included in the dataset are the minDF and DH values related to different combinations of design parameters: window-to-floor ratio, level of obstruction, g-value, and visible transmittance of the glazing system. Different training and validation data sets were obtained from a main data set of 5120 minDF values and 40960 DH values obtained through simulation with Radiance and EnergyPlus, respectively. For each combination of training and validation dataset, the accuracy of the ML model was quantified and compared with the accuracy of the prediction formulas. According to our results, the ML model could provide more accurate minDF/DH predictions than by using the prediction formulas for the same design parameters. However, the amount of room combinations needed to train the machine-learning model is larger than for the calibration of the prediction formulas. The paper discuss in detail the method to use in practice, depending on time and accuracy concerns.