Esophageal squamous cell carcinoma (ESCC) is one of the deadliest cancer types with a poor prognosis due to the lack of symptoms in the early stages and a delayed diagnosis. The present study aimed to identify the risk factors significantly associated with prognosis and to search for novel effective diagnostic modalities for patients with early-stage ESCC. mRNA and methylation data of patients with ESCC and the corresponding clinical information were downloaded from The Cancer Genome Atlas (TCGA) database, and the representation features were screened using deep learning autoencoder. The univariate Cox regression model was used to select the prognosis-related features from the representation features. K-means clustering was used to cluster the TCGA samples. Support vector machine classifier was constructed based on the top 75 features mostly associated with the risk subgroups obtained from K-means clustering. Two ArrayExpress datasets were used to verify the reliability of the obtained risk subgroups. The differentially expressed genes and methylation genes (DEGs and DMGs) between the risk subgroups were analyzed, and pathway enrichment analysis was performed. A total of 500 representation features were produced. Using K-means clustering, the TCGA samples were clustered into two risk subgroups with significantly different overall survival rates. Joint multimodal representation strategy, which showed a good model fitness (C-index=0.760), outperformed early-fusion autoencoder strategy. The joint representation learning-based classification model had good robustness. A total of 1,107 DEGs and 199 DMGs were screened out between the two risk subgroups. The DEGs were involved in 70 pathways, the majority of which were correlated with metastasis and proliferation of various cancer types, including cytokine-cytokine receptor interaction, cell adhesion molecules PPAR signaling pathway, pathways in cancer, transcriptional misregulation in cancer and ECM-receptor interaction pathways. The two survival subgroups obtained via the joint representation learning-based model had good robustness, and had prognostic significance for patients with ESCC.