This paper reports on a user-experience study undertaken as part of the H2020 project MeMAD ('Methods for Managing Audiovisual Data: Combining Automatic Efficiency with Human Accuracy'), in which multimedia content describers from the television and archive industries tested Flow, an online platform, designed to assist the post-editing of automatically generated data, in order to enhance the production of archival descriptions of film content. Our study captured the participant experience using screen recordings, the User Experience Questionnaire (UEQ), a benchmarked interactive media questionnaire and focus group discussions, reporting a broadly positive post-editing environment. Users designated the platform's role in the collation of machine-generated content descriptions, transcripts, named-entities (location, persons, organisations) and translated text as helpful and likely to enhance creative outputs in the longer term. Suggestions for improving the platform included the addition of specialist vocabulary functionality, shot-type detection, film-topic labelling, and automatic music recognition. The limitations of the study are, most notably, the current level of accuracy achieved in computer vision outputs (i.e. automated video descriptions of film material) which has been hindered by the lack of reliable and accurate training data, and the need for a more narratively oriented interface which allows describers to develop their storytelling techniques and build descriptions which fit within a platform-hosted storyboarding functionality. While this work has value in its own right, it can also be regarded as paving the way for the future (semi)automation of audio descriptions to assist audiences experiencing sight impairment, cognitive accessibility difficulties or for whom 'visionless' multimedia consumption is their preferred option.
Human beings find the process of narrative sequencing in written texts and moving imagery a relatively simple task. Key to the success of this activity is establishing coherence by using critical cues to identify key characters, objects, actions and locations as they contribute to plot development. In the drive to make audiovisual media more widely accessible (through audio description), and media archives more searchable (through content description), computer vision experts strive to automate video captioning in order to supplement human description activities. Existing models for automating video descriptions employ deep convolutional neural networks for encoding visual material and feature extraction (Krizhevsky, Sutskever, & Hinton, 2012; Szegedy et al., 2015; He, Zhang, Ren, & Sun, 2016). Recurrent neural networks decode the visual encodings and supply a sentence that describes the moving images in a manner mimicking human performance. However, these descriptions are currently “blind” to narrative coherence. Our study examines the human approach to narrative sequencing and coherence creation using the MeMAD [Methods for Managing Audiovisual Data: Combining Automatic Efficiency with Human Accuracy] film corpus involving five-hundred extracts chosen as stand-alone narrative arcs. We examine character recognition, object detection and temporal continuity as indicators of coherence, using linguistic analysis and qualitative assessments to inform the development of more narratively sophisticated computer models in the future.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.