The main objective of this study is to develop a robust deep learning-based framework to distinguish COVID-19, Community-Acquired Pneumonia (CAP), and Normal cases based on volumetric chest CT scans, which are acquired in different imaging centers using different scanners and technical settings. We demonstrated that while our proposed model is trained on a relatively small dataset acquired from only one imaging center using a specific scanning protocol, it performs well on heterogeneous test sets obtained by multiple scanners using different technical parameters. We also showed that the model can be updated via an unsupervised approach to cope with the data shift between the train and test sets and enhance the robustness of the model upon receiving a new external dataset from a different center. More specifically, we extracted the subset of the test images for which the model generated a confident prediction and used the extracted subset along with the training set to retrain and update the benchmark model (the model trained on the initial train set). Finally, we adopted an ensemble architecture to aggregate the predictions from multiple versions of the model. For initial training and development purposes, an in-house dataset of 171 COVID-19, 60 CAP, and 76 Normal cases was used, which contained volumetric CT scans acquired from one imaging center using a single scanning protocol and standard radiation dose. To evaluate the model, we collected four different test sets retrospectively to investigate the effects of the shifts in the data characteristics on the model’s performance. Among the test cases, there were CT scans with similar characteristics as the train set as well as noisy low-dose and ultra-low-dose CT scans. In addition, some test CT scans were obtained from patients with a history of cardiovascular diseases or surgeries. This dataset is referred to as the “SPGC-COVID” dataset. The entire test dataset used in this study contains 51 COVID-19, 28 CAP, and 51 Normal cases. Experimental results indicate that our proposed framework performs well on all test sets achieving total accuracy of 96.15% (95%CI: [91.25–98.74]), COVID-19 sensitivity of 96.08% (95%CI: [86.54–99.5]), CAP sensitivity of 92.86% (95%CI: [76.50–99.19]), Normal sensitivity of 98.04% (95%CI: [89.55–99.95]) while the confidence intervals are obtained using the significance level of 0.05. The obtained AUC values (One class vs Others) are 0.993 (95%CI: [0.977–1]), 0.989 (95%CI: [0.962–1]), and 0.990 (95%CI: [0.971–1]) for COVID-19, CAP, and Normal classes, respectively. The experimental results also demonstrate the capability of the proposed unsupervised enhancement approach in improving the performance and robustness of the model when being evaluated on varied external test sets.