“…where the first term penalizes the terminal state condition with parametric reference r N , the second term penalizes the energy used in control actions, while the third and fourth terms represent control action and state smoothing penalties. For scaling the control objective terms we use following weight factor values Q r = 1.0, Q dx = 1.0, Q du = 10.0, Q u = 10.0, and we use Q h = 100.0 for state constraint penalties that include the obstacle avoidance constraint (21) and box constraints on states and control actions. For datasets we sample 30 000 uniformly distributed initial state conditions x i 0 , constraints parameters p i , b i , c i , d i , and terminal state references r i N , and use one third of data for train, validation, and test set, respectively.…”