“…Information Retrieval (IR) has a long and robust history in research, with numerous publicly accessible datasets developed to support its advancement. Some of the most well-known and widely-used ones are MS-MARCO [2], TREC [35,38,5], Common Crawl [32], and ClueWeb22 [24]. In response to the challenges of IR, various models and methods have been proposed, utilizing both classic ones such as Vector Space Model (VSM) [31], Latent Semantic Indexing (LSI) [6] and BM25, as well as more modern transformer-based models, such as RoBERTa [19], BERT [7], and T5 [27].…”