Pre-trained Language Model for Web-scale Retrieval in Baidu Search

Liu, Yiding; Lu, Wen Zhuang; Cheng, Suqi; Shi, Daiting; Wang, Shuaiqiang; Cheng, Zhicong; Yin, Dawei

doi:10.1145/3447548.3467149

Cited by 39 publications

(10 citation statements)

References 76 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In more detail, there can be a stack of complex re-rankers after the efficient first-stage retriever. The multi-stage cascaded architecture is very common and practical both in the industry (Yin et al, 2016;Liu et al, 2021d;Li and Xu, 2014) and the ranking leaderboard in the academia (Craswell et al, 2021). Considering the large computational cost of Transformer-based pre-trained models, they are often employed to model the last stage re-ranker whose goal is to re-rank a small set of documents provided by previous stage.…”

Section: Pre-training Methods Applied In Re-ranking Componentmentioning

confidence: 99%

“…In this section, we introduce recent works designing PTMs tailored for IR (Lee et al, 2019b;Chang et al, 2019;Ma et al, 2021b;Ma et al, 2021c;Boualili et al, 2020;Ma et al, 2021d;Zou et al, 2021;Liu et al, 2021d). General pre-trained models like BERT have achieved great success when applied to IR tasks on both the firststage retrieval and the re-ranking stage.…”

Section: Keyphrase Extractionmentioning

confidence: 99%

See 1 more Smart Citation

Pre-training Methods in Information Retrieval

Fan¹,

Xie²,

Cai³

et al. 2021

Preprint

View full text Add to dashboard Cite

The core of information retrieval (IR) is to identify relevant information from large-scale resources and return it as a ranked list to respond to user's information need. Recently, the resurgence of deep learning has greatly advanced this field and leads to a hot topic named NeuIR (i.e., neural information retrieval), especially the paradigm of pre-training methods (PTMs). Owing to sophisticated pre-training objectives and huge model size, pre-trained models can learn universal language representations from massive textual data, which are beneficial to the ranking task of IR. Since there have been a large number of works dedicating to the

show abstract

Section: Pre-training Methods Applied In Re-ranking Componentmentioning

confidence: 99%

Section: Keyphrase Extractionmentioning

confidence: 99%

Pre-training Methods in Information Retrieval

Fan¹,

Xie²,

Cai³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…However, the research on combining language models and user behavior data remains less developed. There are industrial works that fine-tune pretrained language models, for example based on BERT [33] or ERNIE [31,69], to produce representations for text contents and search queries. Yet these works are limited in terms of supported domains and tasks, i.e., they target a single use case such as web searches and concern only scoring tasks, with no intention to support diverse domains or generation tasks.…”

Section: Language Models As Foundationsmentioning

confidence: 99%

M6-Rec: Generative Pretrained Language Models are Open-Ended Recommender Systems

Ma¹,

Zhou²,

Zhou³

et al. 2022

Preprint

View full text Add to dashboard Cite

Industrial recommender systems have been growing increasingly complex, may involve diverse domains such as e-commerce products and user-generated contents, and can comprise a myriad of tasks such as retrieval, ranking, explanation generation, and even AI-assisted content production. The mainstream approach so far is to develop individual algorithms for each domain and each task. In this paper, we explore the possibility of developing a unified foundation model to support open-ended domains and tasks in an industrial recommender system, which may reduce the demand on downstream settings' data and can minimize the carbon footprint by avoiding training a separate model from scratch for every task. Deriving a unified foundation is challenging due to (i) the potentially unlimited set of downstream domains and tasks, and (ii) the real-world systems' emphasis on computational efficiency. We thus build our foundation upon M6, an existing large-scale industrial pretrained language model similar to GPT-3 and T5, and leverage M6's pretrained ability for sample-efficient downstream adaptation, by representing user behavior data as plain texts and converting the tasks to either language understanding or generation. To deal with a tight hardware budget, we propose an improved version of prompt tuning that outperforms fine-tuning with negligible 1% taskspecific parameters, and employ techniques such as late interaction, early exiting, parameter sharing, and pruning to further reduce the inference time and the model size. We demonstrate the foundation model's versatility on a wide range of tasks such as retrieval, ranking, zero-shot recommendation, explanation generation, personalized content creation, and conversational recommendation, and manage to deploy it on both cloud servers and mobile devices. 1 We focus primarily on texts. However, it is straightforward to support images by converting an image into a sequence of tokens, as in DALL-E [40] and M6 [28].

show abstract

“…Embedding based retrieval has been widely applied in practice, such as search engines [30,39], question answering [33,34,54], online advertising [26,31], and content-based recommender systems [27,44]. Knowing that the documents need to be retrieved from a large-scale corpus, where brute-force linear scan will be temporally infeasible, it calls for approximate nearest neighbour search (ANN) [28] such that documents with high embedding similarities can be efficiently selected.…”

Section: Related Workmentioning

confidence: 99%

Distill-VQ: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge from Dense Embeddings

Xiao¹,

Liu²,

Han³

et al. 2022

Preprint

View full text Add to dashboard Cite

Vector quantization (VQ) based ANN indexes, such as Inverted File System (IVF) and Product Quantization (PQ), have been widely applied to embedding based document retrieval thanks to the competitive time and memory efficiency. Originally, VQ is learned to minimize the reconstruction loss, i.e., the distortions between the original dense embeddings and the reconstructed embeddings after quantization. Unfortunately, such an objective is inconsistent with the goal of selecting ground-truth documents for the input query, which may cause severe loss of retrieval quality. Recent works identify such a defect, and propose to minimize the retrieval loss through contrastive learning. However, these methods intensively rely on queries with ground-truth documents, whose performance is limited by the insufficiency of labeled data.In this paper, we propose Distill-VQ, which unifies the learning of IVF and PQ within a knowledge distillation framework. In Distill-VQ, the dense embeddings are leveraged as "teachers", which predict the query's relevance to the sampled documents. The VQ modules are treated as the "students", which are learned to reproduce the predicted relevance, such that the reconstructed embeddings may fully preserve the retrieval result of the dense embeddings. By doing so, Distill-VQ is able to derive substantial training signals from the massive unlabeled data, which significantly contributes to the retrieval quality. We perform comprehensive explorations for the optimal conduct of knowledge distillation, which may provide useful insights for the learning of VQ based ANN index. We also experimentally show that the labeled data is no longer a necessity for high-quality vector quantization, which indicates Distill-VQ's strong applicability in practice. The evaluations are performed on MS MARCO and Natural Questions benchmarks, where Distill-VQ notably outperforms the SOTA VQ methods in Recall and MRR. Our code is avaliable at https://github.com/staoxiao/LibVQ.

show abstract

Pre-trained Language Model for Web-scale Retrieval in Baidu Search

Cited by 39 publications

References 76 publications

Pre-training Methods in Information Retrieval

Pre-training Methods in Information Retrieval

M6-Rec: Generative Pretrained Language Models are Open-Ended Recommender Systems

Distill-VQ: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge from Dense Embeddings

Contact Info

Product

Resources

About