“…Extractive BM25+BERT (Mao et al, 2020) 37.7 60.1 110M Hard EM (Min et al, 2019a) 28.1 50.9 110M Path Retriever (Asai et al, 2020) 32.6 -447M Graph Retriever (Min et al, 2019b) 34.5 56.0 110M ORQA 33.3 45.0 220M REALM (Guu et al, 2020) 40.4 -660M ProQA (Xiong et al, 2021) 34.3 -220M DPR 41.5 56.8 220M RDR (Yang and Seo, 2020) 42.1 57.0 110M GAR+DPR (Mao et al, 2020) 43.8 -626M ColBERT (Khattab et al, 2020) 48.2 63.2 − 440M RIDER (GAR+DPR) (Mao et al, 2021) 48.3 -626M UnitedQA-E (Cheng et al, 2021) 51.8 68.9 440M Generative BM25+SSG (Mao et al, 2020) 35.3 58.6 406M T51.1+SSM 35.2 61.6 11B RAG 44.5 56.8 516M DPR+SSG 42.2 -516M FiD-base (Izacard and Grave, 2021) 48.2 65.0 333M FiD-large (Izacard and Grave, 2021) 51.4 67.6 848M FiD-large++ 54 Finally, we find that our R2-D2 system with 21M passages corpus is competitive even with FiD++, which uses DPR retriever improved via knowledge distillation, and 26M passage corpus, which also includes lists. Additionally, we evaluate our model with a better retrieval model (HN-DPR) based on the DPR checkpoint where hard negatives are mined using the retrieval model itself 11 .…”