“…More generally our approach extends the recent line of work on neural parameterizations of classic grammars (Jiang et al, 2016;Han et al, 2017Han et al, , 2019Kim et al, 2019;Jin et al, 2019;Zhu et al, 2020;Yang et al, 2021b,a;Zhao and Titov, 2020, inter alia), although unlike in these works we focus on the transduction setting. Data Augmentation Our work is also related to the line of work on utilizing grammatical or alignment structures to guide flexible neural seq2seq models via data augmentation (Jia and Liang, 2016;Fadaee et al, 2017;Andreas, 2020;Akyürek et al, 2021;Shi et al, 2021;Yang et al, 2022;Qiu et al, 2022) or auxiliary supervision (Cohn et al, 2016;Mi et al, 2016;Liu et al, 2016;. In contrast to these works our data augmentation module has stronger inductive biases for hierarchical structure due to explicit use of latent tree-based alignments.…”