A Speaker Count System for Telephone Conversations

Ofoegbu, U.O.; Iyer, Ananth N.; Yantorno, Robert E.; Smolenski, Brett Y.

doi:10.1109/ispacs.2006.364899

Cited by 11 publications

(6 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our proposed model anonymously estimates the number of people from the smartphones' acoustic cum locomotive sensing model where we have employed unsupervised learning techniques to cluster different forms of acoustic signatures. For example, Ofoegbu et al [22] have built a model from mean and covariance matrices of the linear predictive cepstral coefficient (LPCC) of voice segments in conversations and used the Mahalanobis distance to determine whether two models belong to the same or different speakers. Iyer et al [23] have performed speaker clustering using distance of the feature vectors extracted from different speakers and finally applied the modified k-means algorithm with distance metric data.…”

Section: Speaker Sensingmentioning

confidence: 99%

Wearable Sensor-Based Location-Specific Occupancy Detection in Smart Environments

Khan

Roy

Hossain

2018

Mobile Information Systems

View full text Add to dashboard Cite

Occupancy detection helps enable various emerging smart environment applications ranging from opportunistic HVAC (heating, ventilation, and air-conditioning) control, effective meeting management, healthy social gathering, and public event planning and organization. Ubiquitous availability of smartphones and wearable sensors with the users for almost 24 hours helps revitalize a multitude of novel applications. The inbuilt microphone sensor in smartphones plays as an inevitable enabler to help detect the number of people conversing with each other in an event or gathering. A large number of other sensors such as accelerometer and gyroscope help count the number of people based on other signals such as locomotive motion. In this work, we propose multimodal data fusion and deep learning approach relying on the smartphone’s microphone and accelerometer sensors to estimate occupancy. We first demonstrate a novel speaker estimation algorithm for people counting and extend the proposed model using deep nets for handling large-scale fluid scenarios with unlabeled acoustic signals. We augment our occupancy detection model with a magnetometer-dependent fingerprinting-based localization scheme to assimilate the volume of location-specific gathering. We also propose crowdsourcing techniques to annotate the semantic location of the occupant. We evaluate our approach in different contexts: conversational, silence, and mixed scenarios in the presence of 10 people. Our experimental results on real-life data traces in natural settings show that our cross-modal approach can achieve approximately 0.53 error count distance for occupancy detection accuracy on average.

show abstract

Section: Speaker Sensingmentioning

confidence: 99%

Wearable Sensor-Based Location-Specific Occupancy Detection in Smart Environments

Khan

Roy

Hossain

2018

Mobile Information Systems

View full text Add to dashboard Cite

show abstract

“…Precisely controlling these parameters at the same time in real world experiments is often unfeasible. For this reason, we follow a common approach in the speech community and generate a separate dataset, as previously shown in [26]. Specifically, we collect audio recordings from 4 male and 4 female participants using a smartphone.…”

Section: Performance With Various Conversation Parametersmentioning

confidence: 99%

“…The closest related research to Crowd++ is [1] and [26]. Agneessens et al [1] present a pitch estimation algorithm to recognize a single speaker from audio recordings containing two speakers with 70% of the times correctly estimate the speaker count (referred to as counting accuracy).…”

Section: Speaker Countingmentioning

confidence: 99%

“…Ofoegbu et al [26] present 60% counting accuracy for 4 speakers (versus Crowd++'s 68% counting accuracy under the same conditions and settings) and a generalized residual radio algorithm with a computational complexity of O(N 2 ) (versus Crowd++'s O(N )). Moreover, the data set in [26] is based on staged data from the HTIMIT database [30] containing transcribed speech of American English speakers. Crowd++'s focus instead is the analysis of audio recordings challenged by noise, mobility and obstacles as people go about their daily lives.…”

Section: Speaker Countingmentioning

confidence: 99%

See 1 more Smart Citation

Crowd++

Liu

et al. 2013

Proceedings of the 2013 ACM International Joint Conference on Pervasive and Ubiquitous Computing

View full text Add to dashboard Cite

Smartphones are excellent mobile sensing platforms, with the microphone in particular being exercised in several audio inference applications. We take smartphone audio inference a step further and demonstrate for the first time that it's possible to accurately estimate the number of people talking in a certain place -with an average error distance of 1.5 speakers -through unsupervised machine learning analysis on audio segments captured by the smartphones. Inference occurs transparently to the user and no human intervention is needed to derive the classification model. Our results are based on the design, implementation, and evaluation of a system called Crowd++, involving 120 participants in 10 very different environments. We show that no dedicated external hardware or cumbersome supervised learning approaches are needed but only off-the-shelf smartphones used in a transparent manner. We believe our findings have profound implications in many research fields, including social sensing and personal wellbeing assessment.

show abstract

“…In this study, the speaker count is assumed to be known as the goal here is to simply compare the clustering performance. Moreover, separate investigations have already been performed to determine the speaker in a given conversation (Ofoegbu et al 2006b(Ofoegbu et al , 2006cIyer et al 2006). Note that the task of speaker clustering is different from SID, due to the fact that telephone conversations are analyzed, where the presence of long speaker homogeneous utterances is limited.…”

Section: Speaker Clusteringmentioning

confidence: 99%

Speaker distinguishing distances: a comparative study

Iyer¹,

Ofoegbu²,

Yantorno

et al. 2007

Int J Speech Technol

Self Cite

View full text Add to dashboard Cite

Speaker discrimination is a vital aspect of speaker recognition applications such as speaker identification, verification, clustering, indexing and change-point detection. These tasks are usually performed using distance-based approaches to compare speaker models or features from homogeneous speaker segments in order to determine whether or not they belong to the same speaker. Several distance measures and features have been examined for all the different applications, however, no single distance or feature has been reported to perform optimally for all applications in all conditions. In this paper, a thorough analysis is made to determine the behavior of some frequently used distance measures, as well as features, in distinguishing speakers for different data lengths. Measures studied include the Mahalanobis distance, Kullback-Leibler (KL) distance, T 2 statistic, Hellinger distance, Bhattacharyya distance, Generalized Likelihood Ratio (GLR), Levenne distance, L 2 and L ∞ distances. The Mel-Scale Frequency Cepstral Coefficient (MFCC), Linear Predictive Cepstral Coefficients (LPCC), Line Spectral Pairs (LSP) and the Log Area Ratios (LAR) comprise the features investigated. The usefulness of these measures is studied for different data lengths. Generally, a larger data size for each speaker results in better speaker A.N. Iyer ( ) differentiating capability, as more information can be taken into account. However, in some applications such as segmentation of telephone data, speakers change frequently, making it impossible to obtain large speaker-consistent utterances (especially when speaker change-points are unknown). A metric is defined for determining the probability of speaker discrimination error obtainable for each distance measure using each feature set, and the effect of data size on this probability is observed. Furthermore, simple distancebased speaker identification and clustering systems are developed, and the performances of each distance and feature for various data sizes are evaluated on these systems in order to illustrate the importance of choosing the appropriate distance and feature for each application. Results show that for tasks which do not involve any limitation of data length, such as speaker identification, the Kullback Leibler distance with the MFCCs yield the highest speaker differentiation performance, which is comparable to results obtained using more complex state-of-the-art speaker identification systems. Results also indicate that the Hellinger and Bhattacharyya distances with the LSPs yield the best performance for small data sizes.

show abstract

A Speaker Count System for Telephone Conversations

Cited by 11 publications

References 4 publications

Wearable Sensor-Based Location-Specific Occupancy Detection in Smart Environments

Wearable Sensor-Based Location-Specific Occupancy Detection in Smart Environments

Crowd++

Speaker distinguishing distances: a comparative study

Contact Info

Product

Resources

About