“…The results of designing optimal control policies based on machine learning techniques can be broadly divided into two categories depending on whether models are taken into account when deriving the controllers; see model-free approaches [29,30,31,32] and model-based approaches [22,33,34,35,36,37,38,39,40,41,42,43,44,45]. In particular, our approach is related to the second category, since controllers are designed from the model estimated by the training data.…”