Diabetic retinopathy (DR) is one of the most common causes of visual impairment worldwide and requires reliable automated detection methods. Numerous research efforts have developed various conventional methods for early detection of DR. Research in the field of DR remains insufficient, indicating the potential for advances in diagnosis. In this paper, a hybrid model (HybridFusionNet) that integrates vision transformer (VIT) and attention processes is presented. It improves classification in the binary (Bcl) and multi-class (Mcl) stages by utilizing deep features from the DR stages. As a result, both the SAN and VIT models improve the recognition accuracy (Acc) in both stages.The HybridFusionNet mechanism achieves a competitive improvement in multi-stage and binary stages, which is Acc in Bcl and Mcl, with 91% and 99%, respectively. This illustrates that this model is suitable for a better diagnosis of DR.