“…During the development of neural networks, orthogonality was first shown to be useful in mitigating the vanishing or exploding gradients problem (Bengio, Simard, and Frasconi 1994), especially on recurrent neural networks (RNNs) (Pascanu, Mikolov, and Bengio 2013;Le, Jaitly, and Hinton 2015;Wisdom et al 2016;Arjovsky, Shah, and Bengio 2016;Jing et al 2017;Hyland and Rätsch 2017;Vorontsov et al 2017;Helfrich and Ye 2020). To improve the efficiency of the optimization algorithms with orthogonality, many techniques have been utilized, e.g., householder reflections (Mhammedi et al 2017), Cayley transform (Helfrich, Willmott, andYe 2018;Maduranga, Helfrich, and Ye 2019), and exponential-map-based parameterization (Lezcano Casado 2019;Lezcano-Casado and Martınez-Rubio 2019).…”