Scene classification plays an important role in understanding high-resolution satellite (HRS) remotely sensed imagery. For remotely sensed scenes, both color information and texture information provide the discriminative ability in classification tasks. In recent years, substantial performance gains in HRS image classification have been reported in the literature. One branch of research combines multiple complementary features based on various aspects such as texture, color and structure. Two methods are commonly used to combine these features: early fusion and late fusion. In this paper, we propose combining the two methods under a tree of regions and present a new descriptor to encode color, texture and structure features using a hierarchical structure-Color Binary Partition Tree (CBPT), which we call the CTS descriptor. Specifically, we first build the hierarchical representation of HRS imagery using the CBPT. Then we quantize the texture and color features of dense regions. Next, we analyze and extract the co-occurrence patterns of regions based on the hierarchical structure. Finally, we encode local descriptors to obtain the final CTS descriptor and test its discriminative capability using object categorization and scene classification with HRS images. The proposed descriptor contains the spectral, textural and structural information of the HRS imagery and is also robust to changes in illuminant color, scale, orientation and contrast. The experimental results demonstrate that the proposed CTS descriptor achieves competitive classification results compared with state-of-the-art algorithms.