Ambient Inteligence aims to create smart spaces providing services in a transparent and non-intrusive fashion, so context awareness and user adaptation are key issues. Speech can be exploited for user adaptation in such scenarios by continuously tracking speaker identity. However, most speaker tracking approaches require processing the full audio recording before determining speaker turns, which makes them unsuitable for online processing and low-latency decision-making. In this work a low-latency speaker tracking system is presented, which deals with continuous audio streams and outputs decisions at one-second intervals, by scoring fixed-length audio segments with a set of target speaker models. A smoothing technique is explored, based on the scores of past segments, which increases the robustness of tracking decisions to local variability. Experimental results are reported on the AMI Corpus of meeting conversations, revealing the effectiveness of the proposed approach when compared to an offline speaker tracking approach developed for reference.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.