In the field of precision medicine, the use of multi-omics data for patient stratification holds great promise for delivering tailored treatments based on comprehensive individual biological profiles. The clinical potential of (multi-)omics data, however, faces significant limitations due to the presence of confounding factors in the data, such as noise from experimental procedures or other irrelevant biological signals. As confounding factors in the data potentially bias patient clustering, deconfounding deep learning frameworks, such as autoencoders, have been developed. Despite encouraging initial outcomes, these frameworks have seen limited validation when applied to clustering tasks using multi-omics data. Based on different deconfounding strategies, we propose four novel multi-omics variational autoencoder frameworks for clustering, capable of reducing confounding effects while preserving the integrity of true biological patterns. We therefore simulate artificial confounders of different effects (linear, non-linear and categorical) using gene expression and DNA methylation data from the TCGA pan-cancer study. We find the conditional multi-omics variational autoencoder to be clearly superior to other models in terms of stability, deconfounding potential, and in retrieving biologically-driven clustering structures. Conversely, the use of adversarial training for deconfounding proves challenging in terms of model optimization, leading to a poor deconfounding performance. Our study finds profound differences between autoencoder-based frameworks for clustering multi-omics data in the presence of confounders. The knowledge obtained from our experiments may aid in selecting an appropriate framework for multi-omics studies, ultimately facilitating better patient stratification for precision medicine.