2019
DOI: 10.3390/app9132636
|View full text |Cite
|
Sign up to set email alerts
|

Addressing Text-Dependent Speaker Verification Using Singing Speech

Abstract: The automatic speaker verification (ASV) has achieved significant progress in recent years. However, it is still very challenging to generalize the ASV technologies to new, unknown and spoofing conditions. Most previous studies focused on extracting the speaker information from natural speech. This paper attempts to address the speaker verification from another perspective. The speaker identity information was exploited from singing speech. We first designed and released a new corpus for speaker verification b… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 6 publications
(14 citation statements)
references
References 33 publications
(50 reference statements)
0
14
0
Order By: Relevance
“…Moreover, innovative Deep Learning and Convolutional Neural Networks architectures are deployed in this direction [4,5,35] with 2-D input features [7,15]. In addition, several experiments were conducted either on singing voices [18] or in utterance level [19]. Despite the progression in algorithmic level, many related efforts shifted to the development of mechanisms for archiving increased language material [36][37][38][39][40].…”
Section: Problem Definition and Motivationmentioning
confidence: 99%
See 4 more Smart Citations
“…Moreover, innovative Deep Learning and Convolutional Neural Networks architectures are deployed in this direction [4,5,35] with 2-D input features [7,15]. In addition, several experiments were conducted either on singing voices [18] or in utterance level [19]. Despite the progression in algorithmic level, many related efforts shifted to the development of mechanisms for archiving increased language material [36][37][38][39][40].…”
Section: Problem Definition and Motivationmentioning
confidence: 99%
“…The audio signals were formatted (transcoded) to PCM (Pulse-Code Modulation) Wav files (16-bit depth, 44,100 Hz sample rate). At the same time, the stereo property was discarded, since it could serve only for the music/genre discrimination and not for voice (and language) recognition, as it was thoroughly studied in [16][17][18][19][20].…”
Section: Data Collection-content Preprocessingmentioning
confidence: 99%
See 3 more Smart Citations