Identity association using PHD filters in multiple head tracking with depth sensors

Liu, Qingju; Campos, Teófilo de; Wang, Wenwu; Hilton, Adrian

doi:10.1109/icassp.2016.7471928

Cited by 4 publications

(3 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To rectify erroneous IDs, we have applied an ID association scheme (top right corner in Fig. 1) [46], which can be applied to the PHD-filtered results directly. However, there were still some remaining ID errors, as shown in Sequences 3 and 4 in Fig.…”

Section: Resultsmentioning

confidence: 99%

See 1 more Smart Citation

Multiple Speaker Tracking in Spatial Audio via PHD Filtering and Depth-Audio Fusion

Liu

Wang

Campos

et al. 2018

IEEE Trans. Multimedia

Self Cite

View full text Add to dashboard Cite

In object-based spatial audio system, positions of the audio objects (e.g. speakers/talkers or voices) presented in the sound scene are required as important metadata attributes for object acquisition and reproduction. Binaural microphones are often used as a physical device to mimic human hearing and to monitor and analyse the scene, including localisation and tracking of multiple speakers. The binaural audio tracker, however, is usually prone to the errors caused by room reverberation and background noise. To address this limitation, we present a multimodal tracking method by fusing the binaural audio with depth information (from a depth sensor, e.g., Kinect). More specifically, the PHD filtering framework is first applied to the depth stream, and a novel clutter intensity model is proposed to improve the robustness of the PHD filter when an object is occluded either by other objects or due to the limited field of view of the depth sensor. To compensate mis-detections in the depth stream, a novel gap filling technique is presented to map audio azimuths obtained from the binaural audio tracker to 3D positions, using speaker-dependent spatial constraints learned from the depth stream. With our proposed method, both the errors in the binaural tracker and the mis-detections in the depth tracker can be significantly reduced. Real-room recordings are used to show the improved performance of the proposed method in removing outliers and reducing mis-detections.

show abstract

Section: Resultsmentioning

confidence: 99%

“…We proposed in our early work in [46] an ID association scheme with short-and long-term analysis. The principle for the short-term analysis is to keep the consistency and continuity of a target's movements within a small time interval.…”

Section: B Id Associationmentioning

confidence: 99%

Multiple Speaker Tracking in Spatial Audio via PHD Filtering and Depth-Audio Fusion

Liu

Wang

Campos

et al. 2018

IEEE Trans. Multimedia

Self Cite

View full text Add to dashboard Cite

show abstract

“…To remove clutters, we used a modified probability hypothesis density (PHD) filtering method [29] with an adaptive clutter intensity model, which takes into account measurement-driven occlusion detection as well as the depth sensor's field of view. After PHD filtering, we applied an identity (ID) association scheme [30], to ensure that the detected ID of each tracked person was consistent throughout a whole scene. Finally, to compensate mis-detections, additional information extracted from the binaural recordings was exploited.…”

Section: A Metadata Estimationmentioning

confidence: 99%

An Audio-Visual System for Object-Based Audio: From Recording to Listening

Coleman

Franck

Francombe

et al. 2018

IEEE Trans. Multimedia

Self Cite

View full text Add to dashboard Cite

Object-based audio is an emerging representation for audio content, where content is represented in a reproductionformat-agnostic way and thus produced once for consumption on many different kinds of devices. This affords new opportunities for immersive, personalized, and interactive listening experiences. This article introduces an end-to-end object-based spatial audio pipeline, from sound recording to listening. A high-level system architecture is proposed, which includes novel audiovisual interfaces to support object-based capture and listenertracked rendering, and incorporates a proposed component for objectification, i.e., recording content directly into an object-based form. Text-based and extensible metadata enable communication between the system components. An open architecture for object rendering is also proposed. The system's capabilities are evaluated in two parts. First, listener-tracked reproduction of metadata automatically estimated from two moving talkers is evaluated using an objective binaural localization model. Second, object-based scene capture with audio extracted using blind source separation (to remix between two talkers) and beamforming (to remix a recording of a jazz group), is evaluated with perceptually-motivated objective and subjective experiments. These experiments demonstrate that the novel components of the system add capabilities beyond the state of the art. Finally, we discuss challenges and future perspectives for object-based audio workflows.

show abstract

Anchor-based group detection in crowd scenes

Chen

Wang

2017

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Identity association using PHD filters in multiple head tracking with depth sensors

Cited by 4 publications

References 18 publications

Multiple Speaker Tracking in Spatial Audio via PHD Filtering and Depth-Audio Fusion

Multiple Speaker Tracking in Spatial Audio via PHD Filtering and Depth-Audio Fusion

An Audio-Visual System for Object-Based Audio: From Recording to Listening

Anchor-based group detection in crowd scenes

Contact Info

Product

Resources

About