Recording workers’ activities is an important, but burdensome, management task for site supervisors. The last decade has seen a growing trend toward vision‐based activity recognition. However, recognizing workers’ activities in far‐field surveillance videos is understudied. This study proposes a hierarchical statistical method for recognizing workers’ activities in far‐field surveillance videos. The method consists of two steps. First, a deep action recognition method was used to recognize workers’ actions, and a new fusion strategy was proposed to consider the characteristics of far‐field surveillance videos. The deep action recognition method with the new fusion strategy has achieved the comparable performance (0.84 average accuracy) on the far‐field surveillance data set in contrast to the original method on the public data sets. Second, a Bayesian nonparametric hidden semi‐Markov model was innovatively used to model and infer workers’ activities based on action sequences. The latent states of the fitted Bayesian model captured workers’ activities in terms of state duration distributions and state transition distributions, which are indispensable for understanding workers’ time allocation. It has been preliminarily illustrated that the activity information learned by the Bayesian model possesses the potential to implement objective work sampling, personal physical fatigue monitoring, trade‐level health risk assessment, and process‐based quality control. Also, the limitations of this study are discussed.