In recent times, there has been a notable upsurge in the frequency of memes across a wide range of social media platforms. Memes provide amusement to people with their humour, but unfortunately, some memes exploit this humour as a cover to spread misogynistic and hateful content targeting women on online platforms. Most of the previously proposed methods for detecting misogyny have primarily concentrated on either textual or visual content. However, there is a noticeable dearth of research on analysing multimodal data that combines both images and text. We propose a DeVi framework comprising DeBERTa and Vision Transformer with an attention-based late fusion strategy for automatic misogyny identification in memes. We evaluated the proposed framework on two different subtasks provided in SemEval-2022 task 5 on the MAMI dataset. Subtask A is a misogynous meme identification task, and subtask B is to identify the type of misogyny, which is a multilabel classification task. The proposed framework achieved an F1-score of 0.865 and 0.783 on subtask A and B, respectively. The experimental findings clearly illustrate that the DeVi framework we propose outperforms existing multimodal models in both subtasks, showcasing its superior performance. This highlights the effectiveness and adaptability of the DeVi framework.