2022
DOI: 10.48550/arxiv.2205.14807
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis

Abstract: Binaural audio plays a significant role in constructing immersive augmented and virtual realities. As it is expensive to record binaural audio from the real world, synthesizing them from mono audio has attracted increasing attention. This synthesis process involves not only the basic physical warping of the mono audio, but also room reverberations and head/ear related filtrations, which, however, are difficult to accurately simulate in traditional digital signal processing. In this paper, we formulate the synt… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 19 publications
0
4
0
Order By: Relevance
“…One major drawback of DPM-based models is the slow sampling speed due to many iterative steps. Therefore, many previous DPM-based TTS methods focus on accelerating the sampling method to boost the inference speed [38], [39], [56], [57], [58]. Some research considers changing the training process to generate high-quality speech.…”
Section: Dpm-based Ttsmentioning
confidence: 99%
“…One major drawback of DPM-based models is the slow sampling speed due to many iterative steps. Therefore, many previous DPM-based TTS methods focus on accelerating the sampling method to boost the inference speed [38], [39], [56], [57], [58]. Some research considers changing the training process to generate high-quality speech.…”
Section: Dpm-based Ttsmentioning
confidence: 99%
“…Denoising diffusion probabilistic models (DDPMs) (Ho et al, 2020) have demonstrated their great generation potential on various applications, such as text-to-image synthesis (Poole et al, 2022;Gu et al, 2022;Kim & Ye, 2021), image inpainting (Lugmayr et al, 2022;Liu et al, 2022;Kawar et al, 2022), speech synthesis (Huang et al, 2021;Lam et al, 2022;Leng et al, 2022), and molecular conformation generation (Hoogeboom et al, 2022;Jing et al, 2022;Wu et al, 2022;Huang et al, 2022). It involves a diffusion process to gradually add noise to data, and a parameterized denoising process to reverse the diffusion process, sampling through gradually removing the noise from random noise.…”
Section: Denoising Diffusion Probabilistic Modelsmentioning
confidence: 99%
“…In recent years, denoising diffusion probabilistic models (DDPMs) (Ho et al, 2020) have been proven to have potential in data generation tasks such as text-to-image generation (Poole et al, 2022;Gu et al, 2022;Kim & Ye, 2021;Chen et al, 2022), speech synthesis (Huang et al, 2021;Lam et al, 2022;Leng et al, 2022), and molecular conformation formation (Hoogeboom et al, 2022;Jing et al, 2022;Wu et al, 2022;Huang et al, 2022). They build a diffusion process to add noise into the sample and a denoising process to remove noise from the sample gradually.…”
Section: Introductionmentioning
confidence: 99%
“…In our approach, we decided that rather than relying on visual cues, allowing direct customization of the output 3D signal with an interactive interface would allow for more freedom and generate more satisfactory results. This paper presented a synthesis process in order to generate binaural audio from a mono audio source, also similar to what we are trying to accomplish [10]. Instead of utilizing head-related transfer functions, they created a novel process for generating binaural audio utilizing diffusion models.…”
Section: Related Workmentioning
confidence: 99%