“…Due to the massive-scale and unconstrained nature of Ego4D, it has proved to be useful for various tasks including action recognition (Liu et al, 2022a;Lange et al, 2023), action detection (Wang et al, 2023a), visual question answering (Bärmann & Waibel, 2022), active speaker detection (Wang et al, 2023d), natural language localisation , natural language queries (Ramakrishnan et al, 2023), gaze estimation (Lai et al, 2022), persuasion modelling for conversational agents (Lai et al, 2023b), audio visual object localisation (Huang et al, 2023a), hand-object segmentation (Zhang et al, 2022b) and action anticipation (Ragusa et al, 2023a;Pasca et al, 2023;Mascaró et al, 2023). New tasks have also been introduced thanks to the diversity of Ego4D, e.g.…”