Abstract. Multimedia data consists of several different types of data, such as numbers, text, images, audio etc. and they usually need to be fused or integrated before analysis. This study investigates a feature-level aggregation approach to combine multimedia datasets for building heterogeneous ensembles for classification. It firstly aggregates multimedia datasets at feature level to form a normalised big dataset, then uses some parts of it to generate classifiers with different learning algorithms. Finally it applies three rules to select appropriate classifiers based on their accuracy and/or diversity to build heterogeneous ensembles. The method is tested on a multimedia dataset and the results show that the heterogeneous ensembles outperform the individual classifiers as well as homogeneous ensembles. However, it should be noted that, it is possible on some cases that the combined dataset does not produce better results than using single media data.