This paper presents the first performance evaluation of local shape descriptors in probabilistic volumetric models (PVM) that are learned from multi-view aerial imagery of large scale urban scenes. The PVM offers a dense solution to the multi-view stereo problem, handling in a probabilistic manner the ambiguities caused by highly reflective surfaces, varying illumination conditions, registration errors, and sensor noise. A GPUbased octree implementation guarantees scalability of the PVM to large urban models, encouraging its application to higher level 3-d computer vision tasks. Furthermore, local descriptors form the basis of many shape-based applications and their performance has been studied extensively for images-based applications. Local descriptors have also been popular for 3-d shape understanding, but most 3-d descriptors have been used within the context of point clouds data collected with range sensors or polygonal meshes of CAD models. This work presents an investigation of the performance of several local shape descriptors in the PVM, which is learned from image data and where surfaces ambiguities are common and explicitly modeled. Descriptors are evaluated on accuracy for object classification using Bag-of-Words models. This evaluation forms a step towards the characterization of a new 3-d probabilistic representation for 3-d scene understanding.