Performances of multi-fidelity numerical models in reproducing the impacts of internal and external wakes on power production are evaluated against SCADA data from two operational offshore wind farms in the North Sea. Results obtained for two engineering wake models, Jensen and Cumulative Curl (default and tuned settings), and a higher-fidelity Reynolds-averaged Navier-Stokes (RANS) setup are presented here. Tuned engineering models were configured against RANS results for an idealized wind farm layout consisting of several turbine setups. Input data for the model evaluation scenarios was obtained from a met mast located between the wind farms and binned according to atmospheric stability, wind speed and direction, and turbulence intensity. Results show that the higher fidelity of RANS is most advantageous for inflows that generate more complex internal wakes (e.g., deep array effects) and in replicating external wake impacts. The tuned setups of the engineering models, modified according to stability, outperform their default setups. Default model properties are defined in accordance with developer recommendations. One of the tuned models shows close farm-level agreements with RANS and even exceeds its performances in some of the cases of lower flow complexity.