“…They do not call for function evaluations but require tuning the learning rate and further possible hyper-parameters such as the mini-batch size. Since the tuning effort may be very computationally demanding [15], more sophisticated approaches use linesearch or trust-region strategies to adaptively choose the learning rate and to avoid tuning efforts, see [2,4,5,9,14,15,25]. In this context, function and gradient approximations have to satisfy sufficient accuracy requirements with some probability.…”