“…We provide six pre-trained models with downstream task finetuning scripts, including ProphetNet-En pre-trained with 160GB English raw text, ProphetNet-Zh pre-trained with 160GB Chinese raw text, ProphetNet-Multi with 101GB Wiki-100 corpus and 1.5TB Common Crawl 3 data, ProphetNet-Dialog-En with 60 million sessions Reddit open-domain dialog corpus, ProphetNet-Dialog-Zh with collected Chinese dialog corpus over 30 million sessions, and ProphetNet-Code pre-trained with 10 million codes and documents. ProphetNet-X achieves new state-of-the-art results on 10 benchmarks, including Chinese summarization (MATINF-SUMM (Xu et al, 2020a) and LC-STS (Hu et al, 2015)), Chinese question answering (MATINF-QA (Xu et al, 2020a)), cross-lingual generation (XGLUE NTG (Liang et al, 2020) and XGLUE QG (Liang et al, 2020)), English summarization (MSNews (Liu et al, 2020a)), English dialog generation (DailyDialog (Li et al, 2017), PersonaChat (Zhang et al, 2018), and DSTC7-AVSD (Alamri et al, 2019)), and code summarization (CodeXGLUE (Lu et al, 2021)). Users can simply download the ProphetNet-X repository and find corresponding pre-trained model with downstream task finetuning scripts.…”