Diabetic retinopathy (DR) can cause irreversible eye damage, even blindness. The prognosis improves with early diagnosis. According to the International Classification of Diabetic Retinopathy Severity Scale (ICDRSS), DR has five stages. Modern, cost‐effective techniques for automatic DR screening and staging of fundus images are based on deep learning (DL). To obtain higher classification accuracy, the combination of several diverse individual DL models into one ensemble could be used. A new approach to model diversity in an ensemble is proposed by manipulating the training input data involving original and four variants of preprocessed image datasets. There are publicly available datasets with labels for all five stages, but some contain poor‐quality images. In contrast, this algorithm was trained on images from a six‐class DDR dataset, including the class of poor‐quality ungradable images, to enhance the classification performance. The solution was evaluated on the APTOS dataset, containing only ICDRSS classes. Classification results of the ensemble model were presented on two different ensemble convolutional neural network (CNN) models, based on Xception and EfficientNetB4 architectures using two fusion approaches. Our proposed ensemble models outperformed all other single deep learning architectures regarding overall accuracy and Cohen's Kappa, with the best results using the EfficientNetB4 architecture.