2021
DOI: 10.1109/access.2021.3084299
|View full text |Cite
|
Sign up to set email alerts
|

A Survey of Speaker Recognition: Fundamental Theories, Recognition Methods and Opportunities

Abstract: Humans can identify a speaker by listening to their voice, over the telephone, or on any digital devices. Acquiring this congenital human competency, authentication technologies based on voice biometrics, such as automatic speaker recognition (ASR), have been introduced. An ASR recognizes speakers by analyzing speech signals and characteristics extracted from speaker's voices. ASR has recently become an effective research area as an essential aspect of voice biometrics. Specifically, this literature survey giv… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
34
0
1

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 84 publications
(35 citation statements)
references
References 215 publications
0
34
0
1
Order By: Relevance
“…It should be noted that a cross-comparison of voice biometrics’ systems is not easy, due to the availability of multiple voice datasets, and only a comparison of systems on the same input data can be considered as being reliable. The following reviews of speaker recognition systems were also used when comparing the results to other ASR systems [ 13 , 14 , 15 ].…”
Section: Related Workmentioning
confidence: 99%
“…It should be noted that a cross-comparison of voice biometrics’ systems is not easy, due to the availability of multiple voice datasets, and only a comparison of systems on the same input data can be considered as being reliable. The following reviews of speaker recognition systems were also used when comparing the results to other ASR systems [ 13 , 14 , 15 ].…”
Section: Related Workmentioning
confidence: 99%
“…Sound recognition plays an important role in most of the encountered audio and audiovisual pattern analysis cases, where related content is massively produced and uploaded (i.e., digital audio broadcasting, podcasts and web radio, but also video on demand (VoD), web-TV and multimodal UGC sharing in general). Specifically, there are various pattern recognition and semantic analysis tasks in the audio domain, including speech-music segmentation [8], genre recognition [10], speaker verification and voice diarization [11], speech enhancement [12,13], sound event detection [14], phoneme and speech recognition [15][16][17], as well as topic/story classification [18][19][20], sentiment analysis and opinion extraction [4,5,21], multiclass audio discrimination [22], environmental sound classification [23] and biomedical audio processing [24]. Audio broadcast is generally considered to be one of the most demanding recognition cases, where a large diversity of content types with many detection difficulties are implicated [1].…”
Section: Introductionmentioning
confidence: 99%
“…Embora tecnologias para reconhecimento de fala tenham tido grandes avanc ¸os nos últimos anos em diversas áreas [Kabir et al 2021], não é trivial o seu uso para avaliac ¸ão de leitura em crianc ¸as, pois esta carrega características próprias, como uma maior frequência de pausas ao longo leitura, com pronúncias erradas, falso comec ¸os de palavras, repetic ¸ões, entre outras [Proenc ¸a et al 2017]. O presente trabalho tem como objetivo apresentar uma abordagem heurística para minimizar o erro na identificac ¸ão automática de palavras lidas.…”
Section: Introduc ¸ãOunclassified