Skin cancer is a dangerous form of cancer that develops slowly in skin cells. Delays in diagnosing and treating these malignant skin conditions may have serious repercussions. Likewise, early skin cancer detection has been shown to improve treatment outcomes. This paper proposes DeepMetaForge, a deeplearning framework for skin cancer detection from metadata-accompanied images. The proposed framework utilizes BEiT, a vision transformer pre-trained as a masked image modeling task, as the image-encoding backbone. We further propose merging the encoded metadata with the derived visual characteristics while compressing the aggregate information simultaneously, simulating how photos with metadata are interpreted. The experiment results on four public datasets of dermoscopic and smartphone skin lesion images reveal that the best configuration of our proposed framework yields 87.1% macro-average F1 on average. The empirical scalability analysis further shows that the proposed framework can be implemented in a variety of machine-learning paradigms, including applications on low-resource devices and as services. The findings shed light on not only the possibility of implementing telemedicine solutions for skin cancer on a nationwide scale that could benefit those in need of quality healthcare but also open doors to many intelligent applications in medicine where images and metadata are collected together, such as disease detection from CT-scan images and patients' expression recognition from facial images.