2021
DOI: 10.48550/arxiv.2110.04692
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Poformer: A simple pooling transformer for speaker verification

Abstract: Most recent speaker verification systems are based on the extraction of speaker embeddings using a deep neural network. The pooling layer in the network aims to aggregate framelevel features extracted by the backbone. In this paper, we propose a new transformer based pooling structure called Po-Former to enhance the ability of the pooling layer to capture information along the whole time axis. Different from previous works that apply attention mechanism in a simple way or implement the multi-head mechanism in … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 17 publications
0
1
0
Order By: Relevance
“…It is shown in [38] that GELU outperforms ReLU, the exponential linear unit (ELU) in different tasks including speech recognition, language processing, and computer vision. For extracting speaker embeddings, GELU is found being used in Transformers and multi-layer perceptron-based speaker verification networks (MLP-SVNet) systems [39,40].…”
Section: Introductionmentioning
confidence: 99%
“…It is shown in [38] that GELU outperforms ReLU, the exponential linear unit (ELU) in different tasks including speech recognition, language processing, and computer vision. For extracting speaker embeddings, GELU is found being used in Transformers and multi-layer perceptron-based speaker verification networks (MLP-SVNet) systems [39,40].…”
Section: Introductionmentioning
confidence: 99%