<p>The current application of control theory is commonly carried out in systems with a model or known system dynamics. However, in practice this is a formidable task to achieve as not all state information can be known. The use of the Output Feedback (OPFB) scheme in the field of control systems also possesses a weakness because it requires the use of an observer. This appears rather contradictory as the use of an observer requires system dynamics information. This research proposes an optimal control scheme using Deep Recurrent Q-Networks (DRQN) to generate an optimal control signal trajectory based on a collection of input and output data from the system itself. The approach proposed in this study is based on the Q-Learning method from the Reinforcement Learning (RL) scheme. The Long-Short Term Memory (LSTM) is used to approximate the Q-function and determine the control signals for a system without a known model. The method that we proposed in this study has been tested on four case studies. The control signal trajectory generated from our proposed algorithm, is much smaller than the control signal that generated from classical Q-Learning scheme. The results of this research are certainly relevant to the aim of OPFB, namely that the controller is designed to be able to regulate (bring the state trajectory to zero) and minimize control signal energy.</p>
<p>It is empirically discovered that the same result is proven by the norm values resulting from the Q-function trajectory. The norm of Q-function trajectory for our proposed algorithm on the 1st, 2nd, 3rd, and 4th case studies are 2.11E-08, 3.15E-06, 3.79E-09, and 1.59E-13, respectively.</p>