The mortality rates of patients contracting the Omicron and Delta variants of COVID-19 are very high, and COVID-19 is the worst variant of COVID. Hence, our objective is to detect COVID-19 Omicron and Delta variants from lung CT-scan images. We designed a unique ensemble model that combines the CNN architecture of a deep neural network—Capsule Network (CapsNet)—and pre-trained architectures, i.e., VGG-16, DenseNet-121, and Inception-v3, to produce a reliable and robust model for diagnosing Omicron and Delta variant data. Despite the solo model’s remarkable accuracy, it can often be difficult to accept its results. The ensemble model, on the other hand, operates according to the scientific tenet of combining the majority votes of various models. The adoption of the transfer learning model in our work is to benefit from previously learned parameters and lower data-hunger architecture. Likewise, CapsNet performs consistently regardless of positional changes, size changes, and changes in the orientation of the input image. The proposed ensemble model produced an accuracy of 99.93%, an AUC of 0.999 and a precision of 99.9%. Finally, the framework is deployed in a local cloud web application so that the diagnosis of these particular variants can be accomplished remotely.