According to research, classifiers and detectors are less accurate when images are blurry, have low contrast, or have other flaws which raise questions about the machine learning model’s ability to recognize items effectively. The chest X-ray image has proven to be the preferred image modality for medical imaging as it contains more information about a patient. Its interpretation is quite difficult, nevertheless. The goal of this research is to construct a reliable deep-learning model capable of producing high classification accuracy on chest x-ray images for lung diseases. To enable a thorough study of the chest X-ray image, the suggested framework first derived richer features using an ensemble technique, then a global second-order pooling is applied to further derive higher global features of the images. Furthermore, the images are then separated into patches and position embedding before analyzing the patches individually via a vision transformer approach. The proposed model yielded 96.01% sensitivity, 96.20% precision, and 98.00% accuracy for the COVID-19 Radiography Dataset while achieving 97.84% accuracy, 96.76% sensitivity and 96.80% precision, for the Covid-ChestX-ray-15k dataset. The experimental findings reveal that the presented models outperform traditional deep learning models and other state-of-the-art approaches provided in the literature.