Predicting subcellular protein localization has become a popular topic due to its utility in understanding disease mechanisms and developing innovative drugs. With the rapid advancement of automated microscopic imaging technology, approaches using bio-images for protein subcellular localization have gained a lot of interest. The Human Protein Atlas (HPA) project is a macro-initiative that aims to map the human proteome utilizing antibody-based proteomics and related c. Millions of images have been tagged with single or multiple labels in the HPA database. However, fewer techniques for predicting the location of proteins have been devised, with the majority of them relying on automatic single-label classification. As a result, there is a need for an automatic and sustainable system capable of multi-label classification of the HPA database. Deep learning presents a potential option for automatic labeling of protein’s subcellular localization, given the vast image number generated by high-content microscopy and the fact that manual labeling is both time-consuming and error-prone. Hence, this research aims to use an ensemble technique for the improvement in the performance of existing state-of-art convolutional neural networks and pretrained models were applied; finally, a stacked ensemble-based deep learning model was presented, which delivers a more reliable and robust classifier. The F1-score, precision, and recall have been used for the evaluation of the proposed model’s efficiency. In addition, a comparison of existing deep learning approaches has been conducted with respect to the proposed method. The results show the proposed ensemble strategy performed exponentially well on the multi-label classification of Human Protein Atlas images, with recall, precision, and F1-score of 0.70, 0.72, and 0.71, respectively.