Conventional speaker localization algorithms, based merely on the received microphone signals, are often sensitive to adverse conditions, such as: high reverberation or low signal to noise ratio (SNR). In some scenarios, e.g. in meeting rooms or cars, it can be assumed that the source position is confined to a predefined area, and the acoustic parameters of the environment are approximately fixed. Such scenarios give rise to the assumption that the acoustic samples from the region of interest have a distinct geometrical structure. In this paper, we show that the high dimensional acoustic samples indeed lie on a low dimensional manifold and can be embedded into a low dimensional space. Motivated by this result, we propose a semi-supervised source localization algorithm which recovers the inverse mapping between the acoustic samples and their corresponding locations. The idea is to use an optimization framework based on manifold regularization, that involves smoothness constraints of possible solutions with respect to the manifold. The proposed algorithm, termed Manifold Regularization for Localization (MRL), is implemented in an adaptive manner. The initialization is conducted with only few labelled samples attached with their respective source locations, and then the system is gradually adapted as new unlabelled samples (with unknown source locations) are received. Experimental results show superior localization performance when compared with a recently presented algorithm based on a manifold learning approach and with the generalized cross-correlation (GCC) algorithm as a baseline.
Objective: The present study implements an automatic method of assessing arousal in vocal data as well as dynamic system models to explore intrapersonal and interpersonal affect dynamics within psychotherapy and to determine whether these dynamics are associated with treatment outcomes. Method: The data of 21,133 mean vocal arousal observations were extracted from 279 therapy sessions in a sample of 30 clients treated by 24 therapists. Before and after each session, clients self-reported their well-being level, using the Outcome Rating Scale. Results: Both clients' and therapists' vocal arousal showed intrapersonal dampening. Specifically, although both therapists and clients departed from their baseline, their vocal arousal levels were "pulled" back to these baselines. In addition, both clients and therapists exhibited interpersonal dampening. Specifically, both the clients' and the therapists' levels of arousal were "pulled" toward the other party's arousal level, and clients were "pulled" by their therapists' vocal arousal toward their own baseline. These dynamics exhibited a linear change over the course of treatment: whereas interpersonal dampening decreased over time, there was an increase in intrapersonal dampening over time. In addition, higher levels of interpersonal dampening were associated with better session outcomes. Conclusions: These findings demonstrate the advantages of using automatic vocal measures to capture nuanced intrapersonal and interpersonal affect dynamics in psychotherapy and demonstrate how these dynamics are associated with treatment gains. Public Health Significance StatementThe current findings highlight the potential of computerized vocal analyses to capture moment-bymoment processes within psychotherapy sessions. They suggest that clients and therapists exhibit both intrapersonal (within person) as well as interpersonal (between person) affect dynamics in their insession emotional arousal levels. Specifically, both clients and therapists not only tended to return to their own affective arousal baseline but also tended to be "pulled" by their partner toward their baseline arousal level. The findings advance the idea that therapists who are synchronized with their clients, but at the same time downregulate their own and their clients' affect, may be more successful in helping their clients develop better affective regulation capabilities.
The problem of source localization with ad hoc microphone networks in noisy and reverberant enclosures, given a training set of prerecorded measurements, is addressed in this paper. The training set is assumed to consist of a limited number of labelled measurements, attached with corresponding positions, and a larger amount of unlabelled measurements from unknown locations. However, microphone calibration is not required. We use a Bayesian inference approach for estimating a function that maps measurement-based feature vectors to the corresponding positions. The central issue is how to combine the information provided by the different microphones in a unified statistical framework. To address this challenge, we model this function using a Gaussian process with a covariance function that encapsulates both the connections between pairs of microphones and the relations among the samples in the training set. The parameters of the process are estimated by optimizing a maximum likelihood (ML) criterion. In addition, a recursive adaptation mechanism is derived where the new streaming measurements are used to update the model. Performance is demonstrated for 2-D localization of both simulated data and real-life recordings in a variety of reverberation and noise levels.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.