For remote sensing (RS) scene classification, most of the existing techniques annotate a scene image with merely a single semantic label. However, with the recent advance of remote sensing technology, more abundant information is contained in high-resolution scenes, making a scene image having multiple semantic meanings (i.e., multilabels). Since multi-label RS scene image annotation is a domain full of challenges due to the ambiguities between complicated scene contents and labels, it motivates us to present a novel algorithm which is based on multi-bag integration. First, to describe the semantic content of RS scene image, we propose to partition a scene image into image patches, defined by a regular grid, and extract the heterogeneous features within each. Second, two kinds of image instance bag, namely segmented instance bag (SIB) and layered instance bag (LIB), are designed to represent the scene image. Third, a Mahalanobis distance-based K-Medoids approach is applied to cluster SIB and LIB, respectively, to convert the multi-instance into single-instance, and then the obtained two single-instances are concatenated to generate more powerful scene-aware representation. At last, a multi-class classification technique is used to make predictions on the class labels. Experiments are performed on real remote sensing images and the results show that the proposed method is valid and can achieve superior performance to a number of stateof-the-art approaches.