2019
DOI: 10.1007/978-3-030-37731-1_49
|View full text |Cite
|
Sign up to set email alerts
|

HRTF Representation with Convolutional Auto-encoder

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 16 publications
0
3
0
Order By: Relevance
“…In previous works [27], [28], individual HRTF synthesis with individual pinna images employed three sub-networks: the variational autoencoder (VAE), fully connected (FC) layers, and conditional VAE (CVAE). While VAE and CVAE models are commonly used for data reconstruction [29], multi-step learning methods can be inefficient and yield suboptimal optimization results since each network is trained separately [48]. To address this limitation, we propose an end-to-end network for unified optimization of HRTF individualization, spanning from the pinna image to HRTF magnitude.…”
Section: Neural Network Structure and Direction-wise Trainingmentioning
confidence: 99%
See 1 more Smart Citation
“…In previous works [27], [28], individual HRTF synthesis with individual pinna images employed three sub-networks: the variational autoencoder (VAE), fully connected (FC) layers, and conditional VAE (CVAE). While VAE and CVAE models are commonly used for data reconstruction [29], multi-step learning methods can be inefficient and yield suboptimal optimization results since each network is trained separately [48]. To address this limitation, we propose an end-to-end network for unified optimization of HRTF individualization, spanning from the pinna image to HRTF magnitude.…”
Section: Neural Network Structure and Direction-wise Trainingmentioning
confidence: 99%
“…More recently, based on experimental findings highlighting the role of ear pinna in generating spectral cues of HRTFs [26], DNN architectures that generate the magnitude spectra of individual HRTFs using only pinna images have been proposed [27], [28]. These DNNs typically employ an autoencoder structure, known for efficient dimensionality reduction of HRTFs [29]. They consist of three sub-networks that convert pinna images through latent variables to HRTF magnitude.…”
Section: Introductionmentioning
confidence: 99%
“…A separate DNN is then trained to predict these latent vectors using anthropometric data as input. Lastly, in a very recent paper, W. Chen et al [6] train a convolutional denoising autoencoder on 2D frequency-elevation input features derived from listener-specific directional components of HRTFs, with the purpose of optimizing HRTF dataset storage.…”
Section: Related Workmentioning
confidence: 99%