“…While preserving the convergence properties of the accelerated gradient method, they provide fast convergence to zero of the gradients and reduce the oscillatory aspects. Several recent studies have been devoted to this subject, see Attouch, Chbani, Fadili, and Riahi [7], Boţ, Csetnek, and László [20], Kim [24], Lin and Jordan [25], Shi, Du, Jordan, and Su [27], and Alesca, Lazlo, and Pinta [4] for an implicit version of the Hessian driven damping. Application to deep learning has been recently developed by Castera, Bolte, Févotte, and Pauwels [23].…”