ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
DOI: 10.1109/icassp43922.2022.9747166
|View full text |Cite
|
Sign up to set email alerts
|

Real Additive Margin Softmax for Speaker Verification

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 14 publications
(5 citation statements)
references
References 19 publications
0
3
0
Order By: Relevance
“…The second is to add an additional trainable neural network module, e.g., decision residual networks (Dr-vectors) [21], deep learning backend (DLB) [22] and tied variational autoencoder (TVAE) [23]. And the last is to develop a robust backend against domain mismatch, such as Coral++ [24], domain-aware batch normalization (DABN) and domain-agnosticinstance normalization (DAIN) [25], information-maximized variational domain adversarial neural network (InfoVDANN) [26], and etc. However, these algorithms rarely use spatial or graph information among the extracted embeddings, which may significantly boost the performance.…”
Section: Introductionmentioning
confidence: 99%
“…The second is to add an additional trainable neural network module, e.g., decision residual networks (Dr-vectors) [21], deep learning backend (DLB) [22] and tied variational autoencoder (TVAE) [23]. And the last is to develop a robust backend against domain mismatch, such as Coral++ [24], domain-aware batch normalization (DABN) and domain-agnosticinstance normalization (DAIN) [25], information-maximized variational domain adversarial neural network (InfoVDANN) [26], and etc. However, these algorithms rarely use spatial or graph information among the extracted embeddings, which may significantly boost the performance.…”
Section: Introductionmentioning
confidence: 99%
“…AAMSoftmax [16] is most commonly used as a loss function in speaker verification tasks, and it can be expressed as (1):…”
Section: B Adaptive Loss Functions In Speaker Verification Tasksmentioning
confidence: 99%
“…Then, 80-dimensional filter bank is extracted as acoustic features using 25ms window length and 10ms hop size. Plus, loss function is AAM-softmax [24] with a 0.2 margin and a scale of 32. ResNets adopt SGD optimizer with a momentum of 0.9 and a weight decay of 1e-4, while DF-ResNets are optimized using AdamW [68] with a weight decay of 0.05.…”
Section: Training Strategiesmentioning
confidence: 99%
“…Lastly, the whole system is trained using a multi-class speaker classifier. To enhance systems' performance, researchers have exerted numerous endeavors in multiple aspects, such as network backbones [4]- [15], pooling strategies [16]- [20], and training criteria [21]- [24].…”
Section: Introductionmentioning
confidence: 99%