2023
DOI: 10.3390/s23042082
|View full text |Cite
|
Sign up to set email alerts
|

Development of Supervised Speaker Diarization System Based on the PyAnnote Audio Processing Library

Abstract: Diarization is an important task when work with audiodata is executed, as it provides a solution to the problem related to the need of dividing one analyzed call recording into several speech recordings, each of which belongs to one speaker. Diarization systems segment audio recordings by defining the time boundaries of utterances, and typically use unsupervised methods to group utterances belonging to individual speakers, but do not answer the question “who is speaking?” On the other hand, there are biometric… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
6
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 7 publications
(6 citation statements)
references
References 28 publications
0
6
0
Order By: Relevance
“…Khoma [17] developed a multi-architecture of the speaker recognition systems based on the integration of identification and diarization approach. This approach was segmented according to segment-level or group-level classification, aside from an opensource PyAnnote module utilized to build the system.…”
Section: Literature Surveymentioning
confidence: 99%
“…Khoma [17] developed a multi-architecture of the speaker recognition systems based on the integration of identification and diarization approach. This approach was segmented according to segment-level or group-level classification, aside from an opensource PyAnnote module utilized to build the system.…”
Section: Literature Surveymentioning
confidence: 99%
“…Studies [3,29] present the results of utilizing the Pyannote and ECAPA models in diarization tasks rather than verification tasks.…”
Section: Related Workmentioning
confidence: 99%
“…Traditionally, speaker diarization involves both segmentation (detecting speaker changes in an audio recording) and clustering (grouping speech segments corresponding to specific speakers based on voice characteristics). Unlike speaker verification and identification, speaker diarization can be approached using unsupervised learning [3].…”
Section: Introductionmentioning
confidence: 99%
“…This collaboration is expected to take place using natural communication, instead of specialized interfaces or protocols, shifting the focus towards human-oriented, context-aware, adaptive systems [5]. With the progress of Artificial Intelligence (AI), as well as NLP (Natural Language Processing), which is clearly seen in modern conversational {systems [6,7], the man-machine collaboration becomes an urgent and promising trend in modern industry. Nevertheless, despite recent significant advancements [8][9][10][11] in the field of Human-Computer Interaction (HCI) (namely: the Amazon Echo available since 2015 [12][13][14], Microsoft Cortana massively expanding in 2015 to numerous platforms [15], Google Speech announced in 2016, upgraded in 2018 [16] and 2022 [17], Google Assistant announced in 2016 [18,19], Google Nest introduced as Google Home also in 2016 [20][21][22][23][24], and the Apple Siri updated in 2017 [25,26]) and undoubtful progress in Artificial Intelligence and Machine Learning [27][28][29][30], conversational AI systems usually still tend to disappoint rather than to amaze their interlocutors (and there are some specific reasons for that, already diagnosed and described [31]).…”
Section: Introductionmentioning
confidence: 99%