Numerical wake models used for offshore wind farm developments are regularly evaluated through comparisons with field measurements of atmospheric conditions and turbine performance data. However, measurement data from offshore sites can be plagued by a variety of issues that are often neglected from data processing procedures and can strongly impact the reliability of the metrics used for model evaluation. While some of these issues have been addressed through assumptions, a variety of shortcomings are frequently overlooked. In this work, the issues found with measurement data used to compile an evaluation dataset to be applied for numerical wake model assessments are presented. The dataset contains long-term operational data from two neighboring offshore wind farms in the German Bight and meteorological data from a met mast located in between the wind farms. The methodologies used to overcome these issues with minimal impact on the reliability of the evaluation dataset for modelling are presented and discussed. Furthermore, the repercussions of overlooking such shortcomings are highlighted and future challenges posed by the planned expansions of offshore wind energy capacity are addressed.