2018
DOI: 10.1109/tmm.2017.2777671
|View full text |Cite
|
Sign up to set email alerts
|

Multiple Speaker Tracking in Spatial Audio via PHD Filtering and Depth-Audio Fusion

Abstract: In object-based spatial audio system, positions of the audio objects (e.g. speakers/talkers or voices) presented in the sound scene are required as important metadata attributes for object acquisition and reproduction. Binaural microphones are often used as a physical device to mimic human hearing and to monitor and analyse the scene, including localisation and tracking of multiple speakers. The binaural audio tracker, however, is usually prone to the errors caused by room reverberation and background noise. T… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 14 publications
(4 citation statements)
references
References 43 publications
0
4
0
Order By: Relevance
“…The TDoA estimates are spreadly used in acoustic localization and tracking [50], which rely on the similarities of audio signals captured by a microphone pair. A general way to derive TDoA estimates is via finding the peak position of the Generalized Cross Correlation (GCC) function [51].…”
Section: Audio Observation -Single Speakermentioning
confidence: 99%
“…The TDoA estimates are spreadly used in acoustic localization and tracking [50], which rely on the similarities of audio signals captured by a microphone pair. A general way to derive TDoA estimates is via finding the peak position of the Generalized Cross Correlation (GCC) function [51].…”
Section: Audio Observation -Single Speakermentioning
confidence: 99%
“…This problem has been solved with accurate calibration and rectification. Various inexpensive offthe-shelf 360 cameras with two fish-eye lenses have recently become popular 3,4,5 .…”
Section: A Approximated Room Geometry Reconstructionmentioning
confidence: 99%
“…Audio and image processing have been investigated as separate research areas, typically ignoring their synergy when they work together. Recently, some works have been proposed to exploit their multimodal information, for applications such as speaker tracking [4], speech recognition [5], and event detection [6]. In this paper, we apply computer vision techniques to support audio reproduction adapted to the acoustics of a specific location.…”
Section: Introductionmentioning
confidence: 99%
“…mainly exploited by the signal processing community. GM-PHD and Sequential Monte Carlo (SMC)-PHD filters are two commonly used implementations in this theory, as they have been able to generate convincing tracking performance in video-based multi-target tracking [2], [3], [5], [7], [15]- [17]. This is attributed to the advantages of PHD filtering methods, as they have the ability to deal with varying number of targets, and also provide the estimates in both cardinality and localization with relatively low computational cost [2].…”
Section: Introductionmentioning
confidence: 99%