2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021
DOI: 10.1109/iccv48922.2021.00035
|View full text |Cite
|
Sign up to set email alerts
|

Image2Reverb: Cross-Modal Reverb Impulse Response Synthesis

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
17
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 31 publications
(17 citation statements)
references
References 19 publications
0
17
0
Order By: Relevance
“…However it has higher perceptual errors (RTE and MOSE). Image2Reverb's [55] high errors reveal the difficulty of our task and data, and its inapplicability to AVSpeech highlights our model's self-supervised training advantage. Despite having the estimated RT60 as input (and thus having low RT60 error), Blind Reverberator's STFT and MOS errors are much higher than AViTAR's, showing that images are a promising way to characterize room acoustics beyond the traditional RT60.…”
Section: Results On Soundspaces-speechmentioning
confidence: 99%
See 4 more Smart Citations
“…However it has higher perceptual errors (RTE and MOSE). Image2Reverb's [55] high errors reveal the difficulty of our task and data, and its inapplicability to AVSpeech highlights our model's self-supervised training advantage. Despite having the estimated RT60 as input (and thus having low RT60 error), Blind Reverberator's STFT and MOS errors are much higher than AViTAR's, showing that images are a promising way to characterize room acoustics beyond the traditional RT60.…”
Section: Results On Soundspaces-speechmentioning
confidence: 99%
“…We demonstrate our approach on challenging real-world sounds and environments, as well as controlled experiments with realistic acoustic simulations in scanned scenes. Our quantitative results and subjective evaluations via human studies show that our model generates audio that matches the target environment with high perceptual quality, outperforming a state-of-the-art model that has heavier supervision requirements [55] as well as traditional acoustic matching models.…”
Section: Source Audiomentioning
confidence: 92%
See 3 more Smart Citations