Figure 1: ShortScribe makes short-form videos accessible with hierarchical video descriptions. ShortScribe extracts video data by identifying key frames then applying automatic speech recognition (ASR), automated description (BLIP-2), and optical character recognition (OCR). A large language model (GPT-4) then generates multiple descriptions. TikTok