To correct wavefront aberrations, commonly employing proportional-integral control in adaptive optics (AO) systems, the control process depends strictly on the response matrix of the deformable mirror. The alignment error between the Hartmann-Shack wavefront sensor and the deformable mirror is caused by various factors in AO systems. In the conventional control method, the response matrix can be recalibrated to reduce the impact of alignment error, but the impact cannot be eliminated. This paper proposes a control method based on a deep learning control model (DLCM) to compensate for wavefront aberrations, eliminating the dependence on the deformable mirror response matrix. Based on the wavefront slope data, the cost functions of the model network and the actor network are defined, and the gradient optimization algorithm improves the efficiency of the network training. The model network guarantees the stability and convergence speed, while the actor network improves the control accuracy, realizing an online identification and self-adaptive control of the system. A parameter-sharing mechanism is adopted between the model network and the actor network to control the system gain. Simulation results show that the DLCM has good adaptability and stability. Through self-learning, it improves the convergence accuracy and iterations, as well as the adjustment tolerance of the system. of the deformable mirror relative to the center of the HS detector, as well as the rotation error α of the deformable mirror. Alignment error will change the design-matching relationship between the deformable mirror and HS sensor. Therefore, the alignment error will lead to a large error in the response matrix. Because the PI controller strictly relies on the response matrix B, this directly affects the performance of the PI controller [7,13,14].
IMPLEMENTATION PRINCIPLE OF THE DLCMThe DLCM consists of model network M S, V e jθ m , actor network AS, V jθ a , decision sample space R, and cost functions (J and J 0 ), as defined in Eqs. (4) and (9). The model network and the actor network have different roles. The model network has three functions: (1) shares the parameters with the actor network; (2) stabilizes the output of the actor network;(3) improves the convergence speed of the actor network. The actor network has two functions: (1) updates the decision sample space and guides the update of the model network;(2) improves control accuracy of the DLCM. The decision sample space is a queue structure where the optimal and up-to-date decision samples are stored. The cost function J is used to train the model network and learn the control rules from the decision samples. The cost function J 0 is used to update the actor network and the model network in real time. The calculation of the cost function J 0 is the key to the system.
A. Model Network