2021
DOI: 10.1109/taslp.2021.3060257
|View full text |Cite
|
Sign up to set email alerts
|

Neural Network Adaptation and Data Augmentation for Multi-Speaker Direction-of-Arrival Estimation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
28
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 36 publications
(28 citation statements)
references
References 41 publications
0
28
0
Order By: Relevance
“…M is the number of azimuth directions, here M = 181. di and d i are the predicted and ground-truth DOA coding of the target speaker. Based on the likelihood-based coding in [21], the desired ground-truth values d i are defined as follows:…”
Section: End-to-end Trainingmentioning
confidence: 99%
“…M is the number of azimuth directions, here M = 181. di and d i are the predicted and ground-truth DOA coding of the target speaker. Based on the likelihood-based coding in [21], the desired ground-truth values d i are defined as follows:…”
Section: End-to-end Trainingmentioning
confidence: 99%
“…In the SSL literature, a great proportion of systems focuses on localizing speech sources, because of its importance in related tasks such as speech enhancement or speech recognition. Examples of speaker localization systems can be found in [39], [40], [41], [42]. In such systems, the neural networks are trained to estimate the DoA of speech sources so that they are somehow specialized in this type of source.…”
Section: B Source Typesmentioning
confidence: 99%
“…Several systems consider only the magnitude spectrograms, such as [52], [140], [199], [204], while other consider only the phase spectrogram [128], [203] When considering both magnitude and phase, they can be stacked also in a third dimension (as well as channels). This representation has been employed in many neural-based SSL systems [41], [70], [131], [143], [147], [148], [152], [153], [187]. Other systems proposed to decompose the complexvalued spectrograms into real and imaginary parts [42], [119], [192], [205].…”
Section: Spectrogram-based Featuresmentioning
confidence: 99%
See 2 more Smart Citations