Diabetes is a common and serious global disease that damages blood vessels in the eye, leading to vision loss. Early and accurate diagnosis of this issue is crucial to reduce the risk of visual impairment. The typical deep learning (DL) methods for diabetic retinopathy (DR) grading are often time‐consuming, resulting in unsatisfactory detection performance due to inadequate representation of lesion features. To overcome these challenges, this research proposes a new automated mechanism for detecting and classifying DR, aiming to identify DR severities and different stages. To figure out and capture feature characteristics from DR samples, a conjugated attention mechanism and vision transformer are utilized within a collective net model, which automatically generates feature maps for diagnosing DR. These extracted feature maps are then fused through the feature fusion function in a fused attention net model, calculating attention weights to produce the most powerful feature map. Finally, the DR cases are identified and discriminated using the kernel extreme learning machine (KELM) model. For evaluating DR severity, our work utilizes four different benchmark datasets: APTOS 2019, MESSIDOR‐2 dataset, DiaRetDB1 V2.1, and DIARETDB0 datasets. To illuminate data noise and unwanted variations, two preprocessing steps are carried out, which include contrast enhancement and illumination correction. The experimental results evaluated using well‐known indicators demonstrate that the suggested method achieves a higher accuracy of 99.63% compared to other baseline methods. This research contributes to the development of powerful DR screening techniques that are less time‐consuming and capable of automatically identifying DR severity levels at a premature level.