It is important to validate turbine interaction models to understand the uncertainties and biases inherent when we model wind farm power output for future wind farms. We present here a repeatable and model-agnostic methodology developed for validating wind farm production models. Power data from the Supervisory Control and Data Acquisition systems of wake-free turbines are used with turbine power curves to generate inlet wind speeds representative of average conditions on the front row of a wind farm. These wind speeds are used, with other model inputs, to run models and predict a modelled power time series for each turbine. The modelled and measured power time series are compared to derive mean bias error metrics. The methodology is applied at 6 offshore wind farms to test established and novel turbine interaction models. We compare the distributions errors predicting power at turbines across models and wind farms. We find that the new models, CFD. ML and the Stratified Eddy Viscosity model, perform well with respect to the established WindFarmer Eddy Viscosity model, and see increased errors for the largest wind farms. We discuss methodological uncertainties in the input wind speed derivation that may cause biases in the overall distributions at windspeeds near the turbine low wind speed cut-in and rated power, and make suggestions for future methodological refinements.