2017 Intelligent Systems Conference (IntelliSys) 2017
DOI: 10.1109/intellisys.2017.8324268
|View full text |Cite
|
Sign up to set email alerts
|

Speaker change detection using features through a neural network speaker classifier

Abstract: The mechanism proposed here is for real-time speaker change detection in conversations, which firstly trains a neural network text-independent speaker classifier using indomain speaker data. Through the network, features of conversational speech from out-of-domain speakers are then converted into likelihood vectors, i.e. similarity scores comparing to the in-domain speakers. These transformed features demonstrate very distinctive patterns, which facilitates differentiating speakers and enable speaker change de… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(3 citation statements)
references
References 9 publications
0
3
0
Order By: Relevance
“…The use of i-vectors was also investigated in [32]. Features extracted from DNN were explored in [33]. Finally, the authors in [34] studied in detail the influence of the online environment on various SCP detection approaches to diarization systems.…”
Section: Related Workmentioning
confidence: 99%
“…The use of i-vectors was also investigated in [32]. Features extracted from DNN were explored in [33]. Finally, the authors in [34] studied in detail the influence of the online environment on various SCP detection approaches to diarization systems.…”
Section: Related Workmentioning
confidence: 99%
“…SCD is known as an important part of speaker diarization [6], [7]. It was previously modeled with distance-based methods [3], [8], [9], which segment audio with a sliding window, and the distance of speaker embedding is used to decide whether a speaker change happens between the adjacent segments. Since the pitch varies saliently with speaker changes, some pitch-based methods detect speaker changes with the change in pitch [10]- [12].…”
Section: Introductionmentioning
confidence: 99%
“…The distance-based methods [1][2][3][4][5] are studied earliest, which calculate the distance between features in the adjacent windows, and once the distance exceeds the threshold, a speaker change is detected. In addition, the model-based methods [6][7][8][9] segment the input audio into fix-length segments that are assumed to contain only one speaker, and models are trained to extract speaker embeddings for each segment. Then the distance between adjacent speaker embeddings is used to decide whether speaker change happens.…”
Section: Introductionmentioning
confidence: 99%