In this paper we propose a pipeline for estimating acoustic 3D room structure with geometry and attribute prediction using spherical 360 • cameras. Instead of setting microphone arrays with loudspeakers to measure acoustic parameters for specific rooms, a simple and practical single-shot capture of the scene using a stereo pair of 360 cameras can be used to simulate those acoustic parameters. We assume that the room and objects can be represented as cuboids aligned to the main axes of the room coordinate (Manhattan world). The scene is captured as a stereo pair using off-the-shelf consumer spherical 360 cameras. A cuboid-based 3D room geometry model is estimated by correspondence matching between captured images and semantic labelling using a convolutional neural network (SegNet). The estimated geometry is used to produce frequency-dependent acoustic predictions of the scene. This is, to our knowledge, the first attempt in the literature to use visual geometry estimation and object classification algorithms to predict acoustic properties. Results are compared to measurements through calculated reverberant spatial audio object parameters used for reverberation reproduction customized to the given loudspeaker set up.
In this paper, we propose an approach to indoor scene understanding from observation of people in single view spherical video. As input, our approach takes a centrally located spherical video capture of an indoor scene, estimating the 3D localisation of human actions performed throughout the long term capture. The central contribution of this work is a deep convolutional encoder-decoder network trained on a synthetic dataset to reconstruct regions of affordance from captured human activity. The predicted affordance segmentation is then applied to compose a reconstruction of the complete 3D scene, integrating the affordance segmentation into 3D space. The mapping learnt between human activity and affordance segmentation demonstrates that omnidirectional observation of human activity can be applied to scene understanding tasks such as 3D reconstruction. We show that our approach using only observation of people performs well against previous approaches, allowing reconstruction of occluded regions and labelling of scene affordances.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.