In this paper, an online optimization approach of a fractional-order PID controller based on a fractional-order actor-critic algorithm (FOPID-FOAC) is proposed. The proposed FOPID-FOAC scheme exploits the advantages of the FOPID controller and FOAC approaches to improve the performance of nonlinear systems. The proposed FOAC is built by developing a FO-based learning approach for the actor-critic neural network with adaptive learning rates. Moreover, a FO rectified linear unit (RLU) is introduced to enable the AC neural network to define and optimize its own activation function. By the means of the Lyapunov theorem, the convergence and the stability analysis of the proposed algorithm are investigated. The FO operators for the FOAC learning algorithm are obtained using the gray wolf optimization (GWO) algorithm. The effectiveness of the proposed approach is proven by extensive simulations based on the tracking problem of the two degrees of freedom (2-DOF) helicopter system and the stabilization issue of the inverted pendulum (IP) system. Moreover, the performance of the proposed algorithm is compared against optimized FOPID control approaches in different system conditions, namely when the system is subjected to parameter uncertainties and external disturbances. The performance comparison is conducted in terms of two types of performance indices, the error performance indices, and the time response performance indices. The first one includes the integral absolute error (IAE), and the integral squared error (ISE), whereas the second type involves the rising time, the maximum overshoot (Max. OS), and the settling time. The simulation results explicitly indicate the high effectiveness of the proposed FOPID-FOAC controller in terms of the two types of performance measurements under different scenarios compared with the other control algorithms.