Automation is the holy grail of performance assessment. Cheap and reliable automated systems that produce consistent feedback on performance. Many such systems have been proposed that accurately measure the state of a product or the outcome of a process. Procedural faults can be detected and even mitigated without the need for human interference. In production industry and professional sports, this is a natural part of business. However, in macrocognitive team performance studies, human appraisal is still king. This study investigates the reliability of human observers as assessors of performance among virtual teams, and what they base their assessments on when only able to monitor one of the team members at a time. The results show that expert observers put a lot of emphasis on task outcomes and on communication and are generally reliable raters of team performance, but there are several aspects that they cannot rate reliably under these circumstances, e.g., team workload, stress, and collaborative problem-solving. Through simple algorithms, this study shows that by capturing task scores and different quantitative communication metrics, team performance ratings can be estimated to closely match how the expert observers assess team performance in a virtual team setting. The implication of the study is that numeric team performance estimations can be acquired by automated systems, with reasonable accuracy and reliability compared to observer ratings.