2020 28th European Signal Processing Conference (EUSIPCO) 2021
DOI: 10.23919/eusipco47968.2020.9287794
|View full text |Cite
|
Sign up to set email alerts
|

Revisiting SincNet: An Evaluation of Feature and Network Hyperparameters for Speaker Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

1
2
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 10 publications
1
2
0
Order By: Relevance
“…A reduction of the Sinc filter widths to 31 samples (2 ms) gave a further improvement, especially when the stride was concurrently changed to 2 samples from 1 sample. This is consistent with the observations of previous work that smaller filter widths in the Sinc layer are superior in the context of speaker recognition [23]. While this seems counter-intuitive given that auditory filter impulse responses at lower centre frequencies are of duration well over 10 ms, the reduced frequency resolution due to the apparent truncation does not seem to harm the performance.…”
Section: Single-task Architecturessupporting
confidence: 91%
See 1 more Smart Citation
“…A reduction of the Sinc filter widths to 31 samples (2 ms) gave a further improvement, especially when the stride was concurrently changed to 2 samples from 1 sample. This is consistent with the observations of previous work that smaller filter widths in the Sinc layer are superior in the context of speaker recognition [23]. While this seems counter-intuitive given that auditory filter impulse responses at lower centre frequencies are of duration well over 10 ms, the reduced frequency resolution due to the apparent truncation does not seem to harm the performance.…”
Section: Single-task Architecturessupporting
confidence: 91%
“…The first variation involves replacing the CNN layer at the input with bandpass constrained, but tunable, filters motivated by the traditionally used mel filterbanks emulating the low-level auditory processing [21]. SincNet has been applied in frame-level speaker identification where its hyperparameters have been found to be critical, although sometimes counter-intuitive, to achieved task accuracy [23]. Next, we try to exploit the linguistic association between phrase boundaries and prominent words with multi-task learning.…”
Section: Introductionmentioning
confidence: 99%
“…The model architectures were manually designed to reduce the model complexity for implementing lightweight SV [42]- [44]. In the work of [42], the original trunk of the SincNet [45] was replaced by a lightweight trunk with 2.8 million parameters for reducing the model complexity. Lee et al [44] designed a hyperbolic ResNet for lightweight application.…”
Section: Related Workmentioning
confidence: 99%