Human activity detection and recognition capabilities have broad applications for military and homeland security. These tasks are very complicated, however, especially when multiple persons are performing concurrent activities in confined spaces that impose significant obstruction, occultation, and observability uncertainty. In this paper, our primary contribution is to present a dedicated taxonomy and kinematic ontology that are developed for in-vehicle group human activities (IVGA). Secondly, we describe a set of hand-observable patterns that represents certain IVGA examples. Thirdly, we propose two classifiers for hand gesture recognition and compare their performance individually and jointly. Finally, we present a variant of Hidden Markov Model for Bayesian tracking, recognition, and annotation of hand motions, which enables spatiotemporal inference to human group activity perception and understanding. To validate our approach, synthetic (graphical data from virtual environment) and real physical environment video imagery are employed to verify the performance of these hand gesture classifiers, while measuring their efficiency and effectiveness based on the proposed Hidden Markov Model for tracking and interpreting dynamic spatiotemporal IVGA scenarios.