“…The main framework of the RL consists of an agent (e.g., a neural network in deep RL) that interacts with an environment to learn a policy that will maximize the cumulative reward over a long time horizon [315]. In recent years, the RL has been explored for fluid dynamics problems including animal locomotion [116,279,339], control of chaotic dynamics [41,59,337], drag reduction of bluff bodies [271,282,330], flow separation control [307], and turbulence closure modeling [242]. Along with a computer simulation environment, RL has been effectively applied for active flow control around bluff bodies in an experimental setup [98].…”