Diabetes is one of the top ten causes of death among adults worldwide. People with diabetes are prone to suffer from eye disease such as diabetic retinopathy (DR). DR damages the blood vessels in the retina and can result in vision loss. DR grading is an essential step to take to help in the early diagnosis and in the effective treatment thereof, and also to slow down its progression to vision impairment. Existing automatic solutions are mostly based on traditional image processing and machine learning techniques. Hence, there is a big gap when it comes to more generic detection and grading of DR. Various deep learning models such as convolutional neural networks (CNNs) have been previously utilized for this purpose. To enhance DR grading, this paper proposes a novel solution based on an ensemble of state-of-the-art deep learning models called vision transformers. A challenging public DR dataset proposed in a 2015 Kaggle challenge was used for training and evaluation of the proposed method. This dataset includes highly imbalanced data with five levels of severity: No DR, Mild, Moderate, Severe, and Proliferative DR. The experiments conducted showed that the proposed solution outperforms existing methods in terms of precision (47%), recall (45%), F1 score (42%), and Quadratic Weighted Kappa (QWK) (60.2%). Finally, it was able to run with low inference time (1.12 seconds). For this reason, the proposed solution can help examiners grade DR more accurately than manual means.