“…OpenSubtitles: OpenSubtitles has been used extensively, including by [Wang, 2017, Sjöblom et al, 2018, Zilio et al, 2018, Gordon and Duh, 2020, Krišlauks and Pinnis, 2020. Papers that train models on datasets that include the OpenSubstitles subset of the Pile include Luo et al [2021], Askell et al [2021] DM Mathematics: DM Mathematics has been used extensively, including by [Cho et al, 2019, Qi and Wu, 2019, Talmor et al, 2020, Dinu et al, 2020, Firestone, 2020. BookCorpus2: The BookCorpus dataset that BookCorpus2 is based on has been used extensively, including by [Karpathy and Fei-Fei, 2015, Reed et al, 2016, Ba et al, 2016, Devlin et al, 2018.…”