Photovoltaic (PV) solar energy has become a reference in electrical generation. The plants currently installed, and those planned have a huge capacity and occupy large areas. The increase in size of the plants presents new challenges in operation and maintenance areas, such as the optimization of the number of sensors installed, large data management and the reduction of the timework in maintenance. The aim of this paper is to show a methodology, to diagnose failures, based on the measured data in the plant. The methodology used is supervised regression machine learning and comparison algorithms. This methodology allows the study of the sensors, the inverters, the joint boxes and the power reduction caused by soiling. The result would allow the detection of around 1-5% of production loss in the plant. The algorithms have been tested with real data of PV plants, and have detected common failures such as production drops in strings and losses due to soiling.