“…Rogers et al [33] proposes an "evidence format" for the explainable part of a dataset composed of Modality (Unstructured text, Semi-structured text, Structured knowledge, Images, Audio, Video, Other combinations) and Amount of evidence (Single source, Multiple sources, Partial source, No sources). (a) spatial reasoning: bAbI [107], SpartQA [108] (b) temporal reasoning: event order (QuAIL [109], TORQUE [110]), event attribution to time (TEQUILA [111], TempQuestions [112], script knowledge (MCScript [113]), event duration (MCTACO [114], QuAIL [109]), temporal commonsense knowledge (MCTACO [114], TIMEDIAL [115]), factoid/news questions with answers where the correct answers change with time (ArchivalQA [116], SituatedQA [117]), temporal reasoning in multimodal setting [DAGA [118], TGIF-QA [119]; (c) belief states: Event2Mind [120], QuAIL [109]; (d) causal relations: ROPES [121], QuAIL [109], QuaRTz [122], ESTER [123]; (e) other relations between events: subevents, conditionals, counterfactuals etc. ESTER [123]; (f) entity properties and relations : 20 social interactions (SocialIQa [124]), properties of characters (QuAIL [109]), physical properties (PIQA [125], QuaRel [126]), numerical properties (NumberSense [127]); (g) tracking entities: across locations (bAbI [arXiv:1502.05698]), in coreference chains (Quoref [128],…”