In this paper, we evaluate the performance of transformer-based natural language processing models in analyzing team communication captured during a live training event. We use a multi-class confusion matrix technique to identify patterns in the performance of two models which recognize dialogue acts and classify how information flows between team members. The dialogue act recognition model was particularly accurate on utterances related to acknowledgement, commanding, and providing information. For information flow, the model showed good performance on classifying utterances labeled as commands from the bottom and middle of the chain of command, although the error analysis revealed a high number of misclassifications related to providing information down and up the chain of command. Results of the multi-class confusion matrix technique provide insight into performance at a more granular level that may lead to model improvements and a better understanding of how the models can be applied to new datasets.