2021
DOI: 10.48550/arxiv.2112.04459
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Self-Supervised Speaker Verification with Simple Siamese Network and Self-Supervised Regularization

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 0 publications
0
2
0
Order By: Relevance
“…To address the classcollision problem in the MoCo system, false negative filtering and infoNCE loss re-weighting were proposed based on the well designed experimental analysis. In combination with the ProtoNCE loss, the formulated C3-MoCo system achieved a 8.6% relative improvement over the best contrastive learning based model [39]. And the proposed C3-DINO (with C3-MoCo assisted) speaker embedding system pushed the benchmark Vox2 test set to 2.5% EER, a new record representing a 48.6% relative improvement against the previous SOTA SSL based SV system.…”
Section: Discussionmentioning
confidence: 92%
See 1 more Smart Citation
“…To address the classcollision problem in the MoCo system, false negative filtering and infoNCE loss re-weighting were proposed based on the well designed experimental analysis. In combination with the ProtoNCE loss, the formulated C3-MoCo system achieved a 8.6% relative improvement over the best contrastive learning based model [39]. And the proposed C3-DINO (with C3-MoCo assisted) speaker embedding system pushed the benchmark Vox2 test set to 2.5% EER, a new record representing a 48.6% relative improvement against the previous SOTA SSL based SV system.…”
Section: Discussionmentioning
confidence: 92%
“…In [29], [43], the authors extended the self-supervised framework to a semi-supervised paradigm where a small portion of the whole dataset is labeled, which provided a good direction to utilize the real-world data. In [39], a selfsupervised regularization term was proposed in addition to the siamese network, an improved SV performance was obtained in the Voxceleb benchmark. In [44], A DINO based speaker embedding system demonstrates the potential of SSL models for SV, a 4.83% EER on Voxceleb1 test set represents the current SOTA performance.…”
Section: Introductionmentioning
confidence: 99%