Language-Based Process Phase Detection in the Trauma Resuscitation

Gu, Yue; Li, Xinyu; Chen, Shuhong; Li, Hunagcan; Farneth, Richard A.; Marsic, Ivan; Burd, Randall S.

doi:10.1109/ichi.2017.50

Cited by 6 publications

(8 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To compare the proposed MAN with previous models, we first re-implemented the approaches in [7], [21]. Since the baseline approaches also used audio or text as input, we retrained them on the trauma dataset with the same training-testing split.…”

Section: Experiments and Evaluationmentioning

confidence: 99%

“…The result in Table III shows the MAN model outperforms the baselines by 6.2% and 7.8% accuracy, respectively. Because the distance between relevant sentences may vary in different cases, it is hard to define a fixed window size as in [7]. Compared to the hierarchical LSTM (H-LSTM) model that using 20s as the context window size to predict the present activity, our model achieves better performance using only present verbal sentence without relying on any context information.…”

Section: Experiments and Evaluationmentioning

confidence: 99%

“…To the best of our knowledge, this is the first research that introduces an architecture using language information and context audio for trauma activity recognition. Secondly, other study [7] uses language to identify trauma phases, which are high-level states opposed to this papers focus on specific low-level activities. We also consider environmental sound and build a multimodal model, which is more generalizable than a textonly model; the environmental sound can be seen as a complementary resource for the existing models.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Multimodal Attention Network for Trauma Activity Recognition from Spoken Language and Environmental Sound

Zhang

Zhao

et al. 2019

2019 IEEE International Conference on Healthcare Informatics (ICHI)

Self Cite

View full text Add to dashboard Cite

Trauma activity recognition aims to detect, recognize, and predict the activities (or tasks) during a trauma resuscitation. Previous work has mainly focused on using various sensor data including image, RFID, and vital signals to generate the trauma event log. However, spoken language and environmental sound, which contain rich communication and contextual information necessary for trauma team cooperation, are still largely ignored. In this paper, we propose a multimodal attention network (MAN) that uses both verbal transcripts and environmental audio stream as input; the model extracts textual and acoustic features using a multi-level multi-head attention module, and forms a final shared representation for trauma activity classification. We evaluated the proposed architecture on 75 actual trauma resuscitation cases collected from a hospital. We achieved 72.4% accuracy with 0.705 F1 score, demonstrating that our proposed architecture is useful and efficient. These results also show that using spoken language and environmental audio indeed helps identify hard-to-recognize activities, compared to previous approaches. We also provide a detailed analysis of the performance and generalization of the proposed multimodal attention network.

show abstract

Section: Experiments and Evaluationmentioning

confidence: 99%

Section: Experiments and Evaluationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Multimodal Attention Network for Trauma Activity Recognition from Spoken Language and Environmental Sound

Zhang

Zhao

et al. 2019

2019 IEEE International Conference on Healthcare Informatics (ICHI)

Self Cite

View full text Add to dashboard Cite

show abstract

“…We used 64 filter banks to extract the MFSCs and extracted both the delta and double delta coefficients. Instead of resizing the MFSC feature maps into the same size as in [18], we selected 64 as the context window size and 15 frames as the shift window to segment the entire MFSC map. In particular, given an audio clip, our MFSC map is a 4D array with size n×64×64×3, where n is the number of shift windows.…”

Section: Feature Extractionmentioning

confidence: 99%

Deep Mul Timodal Learning for Emotion Recognition in Spoken Language

Chen

Marsic

2018

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

In this paper, we present a novel deep multimodal framework to predict human emotions based on sentence-level spoken language. Our architecture has two distinctive characteristics. First, it extracts the high-level features from both text and audio via a hybrid deep multimodal structure, which considers the spatial information from text, temporal information from audio, and high-level associations from low-level handcrafted features. Second, we fuse all features by using a three-layer deep neural network to learn the correlations across modalities and train the feature extraction and fusion modules together, allowing optimal global fine-tuning of the entire structure. We evaluated the proposed framework on the IEMOCAP dataset. Our result shows promising performance, achieving 60.4% in weighted accuracy for five emotion categories.

show abstract

“…More recently, several studies used deep learning techniques to predict intentions from speech [13] and detect medical phases during trauma resuscitation [14]. These studies have focused on deriving the meaning of the sentences using feature extraction from speech logs.…”

Section: Related Workmentioning

confidence: 99%

An Analysis of Speech as a Modality for Activity Recognition during Complex Medical Teamwork

Jagannath

Sarcevic

Marsic

2018

Proceedings of the 12th EAI International Conference on Pervasive Computing Technologies for Healthcare

Self Cite

View full text Add to dashboard Cite

We analyzed the nature of verbal communication among team members in a dynamic medical setting of trauma resuscitation to inform the design of a speech-based automatic activity recognition system. Using speech transcripts from 20 resuscitations, we identified common keywords and speech patterns for different resuscitation activities. Based on these patterns, we developed narrative schemas (speech “workflow” models) for five most frequently performed activities and applied linguistic models to represent relationships between sentences. We evaluated the narrative schemas with 17 new cases, finding that all five schemas adequately represented speech during activities and could serve as a basis for speech-based activity recognition. We also identified similarities between narrative schemas of different activities. We conclude with design implications and challenges associated with speech-based activity recognition in complex medical processes.

show abstract

Language-Based Process Phase Detection in the Trauma Resuscitation

Cited by 6 publications

References 23 publications

Multimodal Attention Network for Trauma Activity Recognition from Spoken Language and Environmental Sound

Multimodal Attention Network for Trauma Activity Recognition from Spoken Language and Environmental Sound

Deep Mul Timodal Learning for Emotion Recognition in Spoken Language

An Analysis of Speech as a Modality for Activity Recognition during Complex Medical Teamwork

Contact Info

Product

Resources

About