2018
DOI: 10.1109/tmm.2017.2726187
|View full text |Cite
|
Sign up to set email alerts
|

Deep Temporal Multimodal Fusion for Medical Procedure Monitoring Using Wearable Sensors

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
33
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 62 publications
(33 citation statements)
references
References 57 publications
0
33
0
Order By: Relevance
“…Due to the recent popularity of deep neural networks (DNN) in multimedia applications, several deep learning based fusion frameworks for HAR has recently been presented. In [23] a supervised deep multimodal fusion framework for process monitoring and verification in the medical and healthcare fields is presented that depends on simultaneous processing of motion data acquired with wearable sensors and video data acquired with body-mounted camera. Authors in [24] proposed DNN based fusion of images and inertial data for improving the performance of human action recognition.…”
Section: Related Workmentioning
confidence: 99%
“…Due to the recent popularity of deep neural networks (DNN) in multimedia applications, several deep learning based fusion frameworks for HAR has recently been presented. In [23] a supervised deep multimodal fusion framework for process monitoring and verification in the medical and healthcare fields is presented that depends on simultaneous processing of motion data acquired with wearable sensors and video data acquired with body-mounted camera. Authors in [24] proposed DNN based fusion of images and inertial data for improving the performance of human action recognition.…”
Section: Related Workmentioning
confidence: 99%
“…Different ways of combining the features have been proposed, depending on the type of application, so that these vectors with multidimensional characteristics can then be morphed into transformed vectors of joint characteristics, from which classification is then carried out [24]. Examples of feature-level fusion are: Feature Aggregation [14,[25][26][27][28][29][30][31], Temporal Fusion [32], Support Vector Machine (SVM)-based multisensor fusion algorithm [33], and Data Fusion Location algorithm [18].…”
Section: State Of the Artmentioning
confidence: 99%
“…Coordinates x and y are width and height dimensions of the face image, and z is the depth dimension. The face points correspond to the left eye (0-7), right eye (8)(9)(10)(11)(12)(13)(14)(15), left eyebrow (16)(17)(18)(19)(20)(21)(22)(23)(24)(25), right eyebrow (26)(27)(28)(29)(30)(31)(32)(33)(34)(35), nose (36)(37)(38)(39)(40)(41)(42)(43)(44)(45)(46)(47), mouth (48)(49)(50)(51)(52)(53)(54)(55)(56)(57)…”
Section: Gfe Data Setsmentioning
confidence: 99%
“…Summarizing such salient scenes is essential because FPV videos tend to be redundant [2]. However, FPV videos are unstable and noisy compared to third-person view (TPV) videos, and most existing methods of video summarization mainly focus on handling the stable scenes in a TPV video [3]- [19]. Moreover, the following differences between FPV and TPV videos substantially complicate summarizing FPV videos compared to summarizing TPV videos.…”
Section: Introductionmentioning
confidence: 99%