We introduce a video-based system for concurrent activity recognition during teamwork in a clinical setting. During system development, we preserved patient and provider privacy by precomputing spatio-temporal features. We extended the inflated 3D ConvNet (i3D) model for concurrent activity recognition. For the model training, we tuned the weights of the final stages of i3D using back-propagated loss from the fully-connected layer. We applied filtering on the model predictions to remove noisy predictions. We evaluated the system on five activities performed during trauma resuscitation, the initial management of injured patients in the emergency department. Our system achieved an average value of 74% average precision (AP) for these five activities and outperformed previous systems designed for the same domain. We visualized feature maps from the model, showing that the system learned to focus on regions relevant to performance of each activity.