2020
DOI: 10.1109/tip.2020.3013801
|View full text |Cite
|
Sign up to set email alerts
|

Accurate Long-Term Multiple People Tracking Using Video and Body-Worn IMUs

Abstract: Most modern approaches for video-based multiple people tracking rely on human appearance to exploit similarities between person detections. Consequently, tracking accuracy degrades if this kind of information is not discriminative or if people change apparel. In contrast, we present a method to fuse video information with additional motion signals from body-worn inertial measurement units (IMUs). In particular, we propose a neural network to relate person detections with IMU orientations, and formulate a graph… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 15 publications
(9 citation statements)
references
References 71 publications
0
9
0
Order By: Relevance
“…Although the global position can be determined by leveraging the distance sensors, subjects can only act within a fixed volume to keep that the distance can be measured by ultrasonic sensors. Another category of works propose to combine IMUs with videos [Gilbert et al 2018;Henschel et al 2020;Malleson et al 2019Marcard et al 2016;Pons-Moll et al 2011, 2010, RGB-D cameras [Helten et al 2013;Zheng et al 2018], or optical markers [Andrews et al 2016]. Gilbert et al [Gilbert et al 2018] fuse multi-viewpoint videos with IMU signals and estimate human poses using 3D convolutional neural networks and recurrent neural networks.…”
Section: Combining Imus With Other Sensors or Camerasmentioning
confidence: 99%
See 1 more Smart Citation
“…Although the global position can be determined by leveraging the distance sensors, subjects can only act within a fixed volume to keep that the distance can be measured by ultrasonic sensors. Another category of works propose to combine IMUs with videos [Gilbert et al 2018;Henschel et al 2020;Malleson et al 2019Marcard et al 2016;Pons-Moll et al 2011, 2010, RGB-D cameras [Helten et al 2013;Zheng et al 2018], or optical markers [Andrews et al 2016]. Gilbert et al [Gilbert et al 2018] fuse multi-viewpoint videos with IMU signals and estimate human poses using 3D convolutional neural networks and recurrent neural networks.…”
Section: Combining Imus With Other Sensors or Camerasmentioning
confidence: 99%
“…This task is even more challenging due to the lack of direct distance measurements, and the acceleration measurements are too noisy to be used directly [Marcard et al 2017]. Previous works address this task by introducing additional vision inputs [Andrews et al 2016;Henschel et al 2020;Malleson et al 2019 or distance measurements [Liu et al 2011;Vlasic et al 2007], which increase the complexity of the system. While the work of SIP [Marcard et al 2017] estimates global translations from IMUs only, it has to run in an offline manner.…”
Section: Fusion-based Global Translation Estimationmentioning
confidence: 99%
“…Therefore, current scientific research also focuses on fusion techniques for multimodal datasets, e.g. [29] and [32], or developing network architectures that are capable to handle both input types of data [16,43]. However, algorithms from computer graphics and computer vision or even language processing often serve as inspiration, which are then transferred to sensor data-based human activity recognition.…”
Section: Related Workmentioning
confidence: 99%
“…In stark contrast, combining occlusion-unaware bodyworn sensors for robust motion capture has been widely explored (Henschel, Von Marcard, and Rosenhahn 2020;Pons-Moll et al 2011;Gilbert et al 2019;Zhang et al 2020;Kaufmann et al 2021). Utilizing Inertial Measurement Units (IMUs) for motion inertia recording is a very popular trend.…”
Section: Introductionmentioning
confidence: 99%