This paper describes the acquisition and content of a new multi-modal database. Some tools for making use of the data streams are also presented. The Computational AudioVisual Analysis (CAVA) database is a unique collection of three synchronised data streams obtained from a binaural microphone pair, a stereoscopic camera pair and a head tracking device. All recordings are made from the perspective of a person; i.e. what would a human with natural head movements see and hear in a given environment. The database is intended to facilitate research into humans' ability to optimise their multi-modal sensory input and fills a gap by providing data that enables human centred audiovisual scene analysis. It also enables 3D localisation using either audio, visual, or audiovisual cues. A total of 50 sessions, with varying degrees of visual and auditory complexity, were recorded. These range from seeing and hearing a single speaker moving in and out of field of view, to moving around a 'cocktail party' style situation, mingling and joining different small groups of people chatting.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.