2017 IEEE International Conference on Computer Vision (ICCV) 2017
DOI: 10.1109/iccv.2017.141
|View full text |Cite
|
Sign up to set email alerts
|

Generative Modeling of Audible Shapes for Object Perception

Abstract: Humans infer rich knowledge of objects from both auditory and visual cues. Building a machine of such competency, however, is very challenging, due to the great difficulty in capturing large-scale, clean data of objects with both their appearance and the sound they make. In this paper, we present a novel, open-source pipeline that generates audiovisual data, purely from 3D object shapes and their physical properties. Through comparison with audio recordings and human behavioral studies, we validate the accurac… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
29
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
3
3

Relationship

2
8

Authors

Journals

citations
Cited by 30 publications
(29 citation statements)
references
References 34 publications
0
29
0
Order By: Relevance
“…Material properties are revealed by the sounds objects make when hit with a drumstick, and can be used to synthesize new sounds from silent videos [33]. Recurrent networks [54] or conditional generative adversarial networks [7] can generate audio for input video frames, while powerful simulators can synthesize audio-visual data for 3D shapes [52]. Rather than generate audio from scratch, our task entails converting an input one-channel audio to twochannel binaural audio guided by the visual frames.…”
Section: Related Workmentioning
confidence: 99%
“…Material properties are revealed by the sounds objects make when hit with a drumstick, and can be used to synthesize new sounds from silent videos [33]. Recurrent networks [54] or conditional generative adversarial networks [7] can generate audio for input video frames, while powerful simulators can synthesize audio-visual data for 3D shapes [52]. Rather than generate audio from scratch, our task entails converting an input one-channel audio to twochannel binaural audio guided by the visual frames.…”
Section: Related Workmentioning
confidence: 99%
“…To calculate this function, tradition way is using fast multi-pole method. [26] But the computational cost of this method is very high. So we use finite difference time domain method mentioned in [23].…”
Section: Acousticmentioning
confidence: 99%
“…Previous work has used sound to better understand objects in scenes. For instance, impact sounds from interacting with objects in a scene to perform segmentation ] and to emulate the sensory interactions of human information processing [Zhang et al 2017a]. Audio has also been used to compute material [Ren et al 2013], object [Zhang et al 2017a], scene [Schissler et al 2018], and acoustical [Tang et al 2020] properties.…”
Section: Introductionmentioning
confidence: 99%