“…Some research has focused on the development of datasets and NLP techniques for video understanding and accessibility [42,47,48,115,131,138]. Others have developed AI-based tools to support accessibility practices [6,11,66,93,110,126,134,135]. Major advancements in multi-modal language models such as OpenAI's GPT-4V [90, 91] and Google's Gemini [24,116] show that AI is already capable of generating image descriptions, and some video descriptions, that attain high levels of quality [135] and BLV user satisfaction [23,110].…”