In this paper, we address the generation of semantic labels describing the headgear accessories carried out by people in a scene under surveillance, only using depth information obtained from a Time-of-Flight (ToF) camera placed in an overhead position. We propose a new method for headgear accessories classification based on the design of a robust processing strategy that includes the estimation of a meaningful feature vector that provides the relevant information about the people’s head and shoulder areas. This paper includes a detailed description of the proposed algorithmic approach, and the results obtained in tests with persons with and without headgear accessories, and with different types of hats and caps. In order to evaluate the proposal, a wide experimental validation has been carried out on a fully labeled database (that has been made available to the scientific community), including a broad variety of people and headgear accessories. For the validation, three different levels of detail have been defined, considering a different number of classes: the first level only includes two classes (hat/cap, and no hat/cap), the second one considers three classes (hat, cap and no hat/cap), and the last one includes the full class set with the five classes (no hat/cap, cap, small size hat, medium size hat, and large size hat). The achieved performance is satisfactory in every case: the average classification rates for the first level reaches 95.25%, for the second one is 92.34%, and for the full class set equals 84.60%. In addition, the online stage processing time is 5.75 ms per frame in a standard PC, thus allowing for real-time operation.