“…Challenge sets exist for a range of natural language processing (NLP) tasks including Sentiment Analysis (Li et al, 2017;Mahler et al, 2017;Staliūnaitė and Bonfil, 2017), Natural Language Inference (McCoy and Linzen, 2019;Rocchietti et al, 2021), Question Answering (Ravichander * Equal contribution by all authors. et al, 2021), Machine Reading Comprehension (Khashabi et al, 2018), Machine Translation (MT) (King and Falkedal, 1990;Isabelle et al, 2017), and the more specific task of pronoun translation in MT (Guillou and Hardmeier, 2016).…”