Abstract. Rolling is a well-established forming process for producing finished or semi-finished products in various industries. Although highly automated, most rolling processes are designed manually by experts based on their knowledge, highly specialized heuristics and analytical process models or numerical simulations. This manual design approach does not lead to an optimization accounting for multiple objectives. Previous work [1] has shown the potential of coupling reinforcement learning (RL) with fast analytical rolling models (FRM) to optimize hot rolling processes. However, the designed pass schedules do not robustly reach the desired final height within typical industrial tolerances. Therefore, in this paper the existing approach of coupling RL with an FRM is extended by dynamically ranges for height reductions. This extension guarantees that the target height is always reached exactly. In addition to the height reduction, the RL algorithm can determine the inter-pass time, initial slab temperature and rolling velocity. For the optimization, an objective function, called reward function, considering all relevant optimization objectives such as the final grain size and energy consumption, was developed. An exemplary training was performed for a defined starting (140 mm) and final height (25 mm). The resulting, automatically designed pass schedules reach the target height and fulfill all defined optimization objective including the required average austenite grain size.