With Over-The-Top traffic being extensively encrypted end-to-end, network operators typically lack insight into the performance of these services, as perceived by the end users. Yet, such an insight is essential for employing QoE-aware network management and potential alleviation of problems that may originate in the network. There is a clear interest from network operators to find ways to estimate service performance in terms of Key Performance Indicators (KPIs) and Quality of Experience (QoE). Over the last years, machine-learning-based (ML) models have proven to be capable of inferring QoE/KPIs from patterns in encrypted network traffic. The particular focus has mostly been on adaptive video streaming services, considering their share of the global network traffic. Those ML-based models have typically been trained and tested on a single dataset collected under specific conditions only. Going beyond related work on the topic of QoE/KPI estimation, we collected two large datasets related to YouTube streaming using the same setup at two different locations in Europe and analyzed the extent to which the differences in network characteristics and location specifics influence the performance of such models. This is of interest, as applicability of the models across diverse networks would significantly reduce the needed extensiveness of data collection typically required for ML-based approaches. In this paper, we compare models trained and tested on a single dataset/location (network-specific), models trained on the merged dataset (general), and models trained on one dataset and tested on the other dataset (cross-tested). The results show that the performance of general models is comparable to that of networkspecific models, but cross-tests exhibit a considerable reduction in performance. With the aim to understand and improve crossnetwork applicability of the models in the future, the paper also provides an investigation of underlying reasons for the performance degradation.