In this paper, a set of best practice data sharing guidelines for wind turbine fault detection model evaluation is developed, which can help practitioners overcome the main challenges of digitalisation. Digitalisation is one of the key drivers for reducing costs and risks over the whole wind energy project life cycle. One of the largest challenges in successfully implementing digitalisation is the lack of data sharing and collaboration between organisations in the sector. In order to overcome this challenge, a new collaboration framework called WeDoWind was developed in recent work. The main innovation of this framework is the way it creates tangible incentives to motivate and empower different types of people from all over the world to share data and knowledge in practice. In this present paper, the challenges related to comparing and evaluating different SCADA-data-based wind turbine fault detection models are investigated by carrying out a new case study, the “WinJi Gearbox Fault Detection Challenge”, based on the WeDoWind framework. A total of six new solutions were submitted to the challenge, and a comparison and evaluation of the results show that, in general, some of the approaches (Particle Swarm Optimisation algorithm for constructing health indicators, performance monitoring using Deep Neural Networks, Combined Ward Hierarchical Clustering and Novelty Detection with Local Outlier Factor and Time-to-failure prediction using Random Forest Regression) appear to exhibit high potential to reach the goals of the Challenge. However, there are a number of concrete things that would have to have been done by the Challenge providers and the Challenge moderators in order to ensure success. This includes enabling access to more details of the different failure types, access to multiple data sets from more wind turbines experiencing gearbox failure, provision of a model or rule relating fault detection times or a remaining useful lifetime to the estimated costs for repairs, replacements and inspections, provision of a clear strategy for training and test periods in advance, as well as provision of a pre-defined template or requirements for the results. These learning outcomes are used directly to define a set of best practice data sharing guidelines for wind turbine fault detection model evaluation. The guidelines can be used by researchers in the sector in order to improve model evaluation and data sharing in the future.