Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-1982
|View full text |Cite
|
Sign up to set email alerts
|

RawNet: Advanced End-to-End Deep Neural Network Using Raw Waveforms for Text-Independent Speaker Verification

Abstract: Recently, direct modeling of raw waveforms using deep neural networks has been widely studied for a number of tasks in audio domains. In speaker verification, however, utilization of raw waveforms is in its preliminary phase, requiring further investigation. In this study, we explore end-to-end deep neural networks that input raw waveforms to improve various aspects: front-end speaker embedding extraction including model architecture, pre-training scheme, additional objective functions, and back-end classifica… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
66
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
4
1

Relationship

1
8

Authors

Journals

citations
Cited by 119 publications
(66 citation statements)
references
References 30 publications
0
66
0
Order By: Relevance
“…Note that this likelihood ratio is different from the likelihood ratio of the NL score in Eq. (8). The PLDA LR can be formally represented by:…”
Section: X N From the Same Class P(x From A Unique Class)p(x 1 mentioning
confidence: 99%
See 1 more Smart Citation
“…Note that this likelihood ratio is different from the likelihood ratio of the NL score in Eq. (8). The PLDA LR can be formally represented by:…”
Section: X N From the Same Class P(x From A Unique Class)p(x 1 mentioning
confidence: 99%
“…Recently, deep learning methods gained much attention and embedding based on deep neural nets (DNN) becomes popular [5,6]. With the efforts from multiple research groups, deep speaker embedding models have been significantly improved by comprehensive architectures [7,8], smart pooling approaches [9][10][11][12], task-oriented objectives [13][14][15][16][17][18], and carefully designed training schemes [19][20][21]. As a result, the deep embedding approach has achieved state-of-theart performance [22].…”
Section: Introductionmentioning
confidence: 99%
“…Recent advances in deep neural networks (DNNs) have improved the performance of speaker verification (SV) systems, including short-duration and far-field scenarios [1][2][3][4][5]. However, SV systems are known to be vulnerable to various presentation attacks, such as replay attacks, voice conversion, and speech synthesis.…”
Section: Introductionmentioning
confidence: 99%
“…To address this problem, several studies have applied a pooling layer or temporal average layer to an end-to-end system [2,3]. The second is a speaker embedding-based system [4][5][6][7][8][9][10][11][12][13][14], which generates an input of variable length into a vector of fixed length using a DNN. The generated vector is used as an embedding to represent the speaker.…”
Section: Introductionmentioning
confidence: 99%