Multimodal Attention Network for Trauma Activity Recognition from Spoken Language and Environmental Sound

Gu, Yue; Zhang, Ruiyu; Zhao, Xinwei; Chen, Shuhong; Abdulbaqi, Jalal; Marsic, Ivan; Cheng, Mei; Burd, Randall S.

doi:10.1109/ichi.2019.8904713

Cited by 5 publications

(16 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Text-based activity recognition employed the transcript of the verbal communication between the medical team to predict the activity type. Recent research applied a multi-head attention architecture [13] to predict a speech-reliant activity from the transcripts and the environmental sound [6]. The drawback in this approach is that obtaining the text requires additional automatic speech recognition (ASR).…”

Section: Related Workmentioning

confidence: 99%

“…The audio modality was used as an auxiliary to other modalities in works [5], [6]. These papers analyzed the audio ability to improve the accuracy of the activity recognition.…”

Section: Related Workmentioning

confidence: 99%

“…They did not provide quantitative analysis to distinguish the difference between each modality performance. In [6], the authors created a multimodal transformer network to process the transcribed spoken language and the environmental sound to predict the trauma activities. The quantitative analysis showed the average accuracy 36.4 when using only audio, and the accuracy increased to 71.8 when using both modalities.…”

Section: Related Workmentioning

confidence: 99%

“…The final feature map length was 600. Following the work [6], we segmented the input feature map into 10 frame sub-maps to avoid processing distant audio frames. The input sample shape for every single channel was (60, 40, 10).…”

Section: Data Preprocessing and Configurationmentioning

confidence: 99%

“…In addition, a study [4] found that medical experts can predict resuscitation activities with 87% accuracy using only the verbal communication transcripts. Furthermore, previous studies showed that fusing the speech with the video, RFID or transcripts increases activity recognition accuracy [5], [6].…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Speech-Based Activity Recognition for Trauma Resuscitation

Abdulbaqi

et al. 2020

2020 IEEE International Conference on Healthcare Informatics (ICHI)

Self Cite

View full text Add to dashboard Cite

We present a speech-based approach to recognize team activities in the context of trauma resuscitation. We first analyzed the audio recordings of trauma resuscitations in terms of activity frequency, noise-level, and activity-related keyword frequency to determine the dataset characteristics. We next evaluated different audio-preprocessing parameters (spectral feature types and audio channels) to find the optimal configuration. We then introduced a novel neural network to recognize the trauma activities using a modified VGG network that extracts features from the audio input. The output of the modified VGG network is combined with the output of a network that takes keyword text as input, and the combination is used to generate activity labels. We compared our system with several baselines and performed a detailed analysis of the performance results for specific activities. Our results show that our proposed architecture that uses Melspectrum spectral coefficients features with a stereo channel and activity-specific frequent keywords achieve the highest accuracy and average F1-score.

show abstract

Section: Related Workmentioning

confidence: 99%

“…The audio modality was used as an auxiliary to other modalities in works [5], [6]. These papers analyzed the audio ability to improve the accuracy of the activity recognition.…”

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Data Preprocessing and Configurationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Speech-Based Activity Recognition for Trauma Resuscitation

Abdulbaqi

et al. 2020

2020 IEEE International Conference on Healthcare Informatics (ICHI)

Self Cite

View full text Add to dashboard Cite

show abstract

The WeChat Mini-program for Oral English Evaluation Based on the Smart Listening Algorithm

Nie

Yang

et al. 2022

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Video-based Concurrent Activity Recognition for Trauma Resuscitation

Zhang

Marsic

et al. 2020

2020 IEEE International Conference on Healthcare Informatics (ICHI)

Self Cite

View full text Add to dashboard Cite

We introduce a video-based system for concurrent activity recognition during teamwork in a clinical setting. During system development, we preserved patient and provider privacy by precomputing spatio-temporal features. We extended the inflated 3D ConvNet (i3D) model for concurrent activity recognition. For the model training, we tuned the weights of the final stages of i3D using back-propagated loss from the fully-connected layer. We applied filtering on the model predictions to remove noisy predictions. We evaluated the system on five activities performed during trauma resuscitation, the initial management of injured patients in the emergency department. Our system achieved an average value of 74% average precision (AP) for these five activities and outperformed previous systems designed for the same domain. We visualized feature maps from the model, showing that the system learned to focus on regions relevant to performance of each activity.

show abstract

Multimodal Attention Network for Trauma Activity Recognition from Spoken Language and Environmental Sound

Cited by 5 publications

References 15 publications

Speech-Based Activity Recognition for Trauma Resuscitation

Speech-Based Activity Recognition for Trauma Resuscitation

The WeChat Mini-program for Oral English Evaluation Based on the Smart Listening Algorithm

Video-based Concurrent Activity Recognition for Trauma Resuscitation

Contact Info

Product

Resources

About