“…The predominant method for text matching tasks such as non-factoid answer selection and question similarity is to train a neural architecture on a large quantity of labeled in-domain data. This includes CNN and LSTM models with attention Wang et al, 2016;Rücklé and Gurevych, 2017), compare-aggregate approaches (Wang and Jiang, 2017;Rücklé et al, 2019a), and, more recently, transformer-based models (Hashemi et al, 2020;Mass et al, 2019). Fine-tuning of large pre-trained transformers such as BERT (Devlin et al, 2019) and RoBERTa (Liu et al, 2019) currently achieves stateof-the-art performances on many related benchmarks (Garg et al, 2020;Mass et al, 2019;Rochette et al, 2019;Nogueira and Cho, 2019).…”