Ultra-Large-Scale (ULS) systems face continuously evolving field workloads in terms of activated/disabled feature sets, varying usage patterns and changing deployment configurations. These evolving workloads often have a large impact on the performance of a ULS system. Hence, continuous load testing is critical to ensuring the error-free operation of such systems. A common challenge facing performance analysts is to validate if a load test closely resembles the current field workloads. Such validation may be performed by comparing execution logs from the load test and the field. However, the size and unstructured nature of execution logs makes such a comparison unfeasible without automated support. In this paper, we propose an automated approach to validate whether a load test resembles the field workload and, if not, determines how they differ by compare execution logs from a load test and the field. Performance analysts can then update their load test cases to eliminate such differences, hence creating more realistic load test cases. We perform three case studies on two large systems: one open-source system and one enterprise system. Our approach identifies differences between load tests and the field with a precision of ≥75% compared to only ≥16% for the state-of-the-practice.