2019
DOI: 10.1109/taslp.2018.2877892
|View full text |Cite
|
Sign up to set email alerts
|

CountNet: Estimating the Number of Concurrent Speakers Using Supervised Learning

Abstract: Estimating the maximum number of concurrent speakers from single-channel mixtures is a challenging problem and an essential first step to address various audio-based tasks such as blind source separation, speaker diarization, and audio surveillance. We propose a unifying probabilistic paradigm, where deep neural network architectures are used to infer output posterior distributions. These probabilities are in turn processed to yield discrete point estimates. Designing such architectures often involves two impo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
53
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 46 publications
(55 citation statements)
references
References 66 publications
2
53
0
Order By: Relevance
“…ML methods can be divided into two types (Zitnik et al, 2019), supervised learning and unsupervised learning. Supervised learning (Stoter et al, 2019) requires that the model be trained using a training set. The training sets for supervised learning include features and results.…”
Section: Machine Learning Methodsmentioning
confidence: 99%
“…ML methods can be divided into two types (Zitnik et al, 2019), supervised learning and unsupervised learning. Supervised learning (Stoter et al, 2019) requires that the model be trained using a training set. The training sets for supervised learning include features and results.…”
Section: Machine Learning Methodsmentioning
confidence: 99%
“…Speaker counting can be formulated as an (N + 1)-classes classification problem with N the maximum possible number of overlapping speakers [21]. While this approach is not the only one for supervised speaker counting, it has been found to be the most effective [22], provided the maximum possible number is known.…”
Section: Overlapped Speech Detection and Counting Taskmentioning
confidence: 99%
“…In parallel, Stöter et al [21] have shown that a neural network can be trained to estimate the number of concurrent speakers rather than simply performing joint VAD+OSD. This approach has been further expanded in [22] where three different output distributions for this speaker counting problem are proposed, different neural architectures are explored, and the performance is compared with humans. Also, in [23], a deep learning based speaker counting algorithm was evaluated against…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…In addition, various methods based on deep learning on counting the NoS have emerged, such as [18]- [21]. In [22], a new NoS estimation architecture is provided via combining the convolutional recurrent neural networks and adequate input features of speeches, which is designed to improve the performance of NoS estimation from the single channel mixtures.…”
Section: Introductionmentioning
confidence: 99%