2023
DOI: 10.3390/s23031203
|View full text |Cite
|
Sign up to set email alerts
|

ResSKNet-SSDP: Effective and Light End-To-End Architecture for Speaker Recognition

Abstract: In speaker recognition tasks, convolutional neural network (CNN)-based approaches have shown significant success. Modeling the long-term contexts and efficiently aggregating the information are two challenges in speaker recognition, and they have a critical impact on system performance. Previous research has addressed these issues by introducing deeper, wider, and more complex network architectures and aggregation methods. However, it is difficult to significantly improve the performance with these approaches … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(2 citation statements)
references
References 46 publications
0
2
0
Order By: Relevance
“…Compared with the hyper-LPR model, the UNET-GWO-SVM model designed in this paper dramatically improves the positioning accuracy and recognition speed. On the other hand, there is still a gap compared with different end-to-end neural network recognition algorithms [41][42][43][44]. However, they consider the applicability of embedded hardware.…”
Section: License Plate Recognition Experimentsmentioning
confidence: 99%
“…Compared with the hyper-LPR model, the UNET-GWO-SVM model designed in this paper dramatically improves the positioning accuracy and recognition speed. On the other hand, there is still a gap compared with different end-to-end neural network recognition algorithms [41][42][43][44]. However, they consider the applicability of embedded hardware.…”
Section: License Plate Recognition Experimentsmentioning
confidence: 99%
“…Deng et al [ 3 ] designed an end-to-end speaker recognition system, ResSKNet-SSDP, with an improved feature extraction capability and improved adaptation to the speaker recognition task. The proposed system makes it more suitable for practical application as it is more efficient in terms of the Equal Error Rate (EER) and detection cost function (DCF), and its structure is lightweight with fewer parameters and less interference time compared to many of the existing methods.…”
mentioning
confidence: 99%