A reinforcement learning-based adaptive optimal fuzzy controller is proposed for maximum power point tracking (MPPT) control of a variable-speed permanent magnet synchronous generator-based wind energy generation system. The algorithm consists of a critic, an adaptive optimal fuzzy controller, and an adaptive optimal fuzzy estimator. The critic is built based on an adaptive neuro-fuzzy inference system (ANFIS) network instead of the neural network as normal to reduce the computation. The error between the system output and the estimator output is used as the input of the critic. In addition, the critic is used to calculate the update law for the parameters of the adaptive optimal fuzzy controller and adaptive optimal fuzzy estimator based on minimizing the input error function. Moreover, the proposed control scheme is output feedback instead of state feedback, which does not require a system model as well as system parameters, so the system is robust to uncertainties and external disturbances. Besides, the stabilization proof is accomplished by using the Lyapunov stability theorem for the closed-loop system and the convergence of the update law. Finally, the effectiveness of the proposed reinforcement learning-based adaptive optimal fuzzy control scheme is verified through simulation with various scenarios such as step wind speed, random wind speed, and system parameter variations. Also, the comparisons with other control schemes in the stateof-art (neural network reinforcement learning based adaptive optimal fuzzy controller, PI controller) are executed to demonstrate the advantages of the proposed control scheme. 17 18 numerous control strategies have been introduced such as 49 PI control [5], sliding mode control (SMC) [6], [7], [8], 50 [9], adaptive control [10], [11], and model predictive control 51 (MPC) [12]. In [5], a PI controller is employed for the 52 current control loop of the multi-motor wind turbine system. 53 However, this PI controller cannot ensure good performance 54 under condition variations due to the nonlinearity of WEGS 55 and the affection of the environment (e.g., wind speed, air 56 density). Next, SMC is considered as the nonlinear control 57 technique used to deal with the parametric uncertainties and 58 disturbances of WECS for the MPPT control. In [6] and [7], 59 the high-order SMCs are applied to improve the performance 60 of the WECS by reducing the chattering phenomenon with 61 the continuous control input. However, the fluctuation of the 62 output voltage and power is still high [7]. In [8], an enhanced 63 reaching law-based SMC method is introduced for the MPPT 64 control of offshore WECS, which consists of two loops: the 65 current control loop with a conventional PI controller and 66 the speed control loop with a finite time reaching SMC, 67 and a mechanical torque observer. This control strategy 68 significantly improves the ability of the WEGS to resist 69 uncertainties and disturbances. In [9], a fixed-time fractional-70 order SMC is designed for both rotor side converter (...