“…Previous non-monotonic methods (Serdyuk et al, 2018;Zhang et al, 2018;Zhou et al, 2019a,b;Zhang et al, 2019;Welleck et al, 2019) jointly leverage L2R and R2L information. Non-monotonic methods are also widely used in many tasks (Huang et al, 2018;Shu and Nakayama, 2018), such as parsing (Goldberg and Elhadad, 2010), image caption (Mehri and Sigal, 2018), and dependency parsing (Kiperwasser and Goldberg, 2016;. Similarly, insertion-based method (Gu et al, 2019;Stern et al, 2019) predicts the next token and its position to be inserted.…”