“…T5 (Raffel et al, 2019) and ELECTRA (Clark et al, 2020), in the United-QA, we further study techniques to improve and stabilize the model training for both extractive and generative readers. Specifically, we consider posterior differential regularization (Cheng et al, 2020b) and distant supervision assumptions (Cheng et al, 2020a) to enhance the extractive reader. For the generative reader, we incorporate attention bias (Lewis et al, 2020a) into T5-FID (Izacard and Grave, 2020), and improve unconstrained generation training with adversarial training (Ju et al, 2019;Jiang et al, 2020).…”