The media industry is currently being pulled in the often-opposing\ud
directions of increased realism (high resolution, stereoscopic,\ud
large screen) and personalization (selection and control of\ud
content, availability on many devices). We investigate the\ud
feasibility of an end-to-end format-agnostic approach to support\ud
both these trends. In this paper, different aspects of a format-\ud
agnostic capture, production, delivery and rendering system are\ud
discussed. At the capture stage, the concept of layered scene\ud
representation is introduced, including panoramic video and 3D\ud
audio capture. At the analysis stage, a virtual director component\ud
is discussed that allows for automatic execution of\ud
cinematographic principles, using feature tracking and saliency\ud
detection. At the delivery stage, resolution-independent\ud
audiovisual transport mechanisms for both managed and\ud
unmanaged networks are treated. In the rendering stage, a\ud
rendering process that includes the manipulation of audiovisual\ud
content to match the connected display and loudspeaker properties\ud
is introduced. Different parts of the complete system are revisited\ud
demonstrating the requirements and the potential of this advanced\ud
concept.Peer ReviewedPostprint (published version
In everyday life, speech is often accompanied by a situationspecific acoustic cue; a hungry bark as you ask 'Has anyone fed the dog?'. This paper investigates the effect such cues have on speech intelligibility in noise and evaluates their interaction with the established effect of situation-specific semantic cues. This work is motivated by the introduction of new object-based broadcast formats, which have the potential to optimise intelligibility by controlling the level of individual broadcast audio elements, at point of service. Results of this study show that situation-specific acoustic cues alone can improve word recognition in multi-talker babble by 69.5%, a similar amount to semantic cues. The combination of both semantic and acoustic cues provide further improvement of 106.0% compared with no cues, and 18.7% compared with semantic cues only. Interestingly, whilst increasing subjective intelligibility of the target word, the presence of acoustic cues degraded the objective intelligibility of the speech-based semantic cues by 47.0% (equivalent to reducing the speech level by 4.5 dB). This paper discusses the interactions between the two types of cues and the implications that these results have for assessing and improving the intelligibility of broadcast speech.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.