The estimation of traffic variables and provision of traffic information are the most important components of intelligent transportation systems. Advances in technology have led to the collection of various traffic sensor data, and nonlinear dependencies between traffic variables have enabled the development of models based on deep learning approaches. However, there is a missing data segment where data collection is not possible because of the non-installation of the sensor, malfunction of the sensor, or error in communication. In this study, a deep multimodal model is proposed for traffic speed estimation of the missing data segment. We implement the proposed model using two heterogeneous traffic sensors, that is, a vehicle detection system and dedicated short-range communication. The structure of the proposed model consists of three multilayer perceptron models, two of which receive each modality as input data and one fusion model that receives the concatenated outputs from each modality model as input data. To evaluate the estimation performance of the deep multimodal model, we use three performance measures to compare the multimodal model with the arithmetic average model and a single-modality model. The results show that the single-modality model and the proposed deep multimodal model outperform the arithmetic average model. In particular, the deep multimodal model shows the highest accuracies of 90.5% and 92.1% on weekends and peak hours, respectively, without reflecting the true value. The proposed deep multimodal model has three contributions, that is, high accuracy using two different sensors, robustness in various periods, and real-time application with fast computational time.