Identifying sedimentary facies represents a fundamental aspect of oil and gas exploration. In recent years, geologists have employed deep learning methods to develop comprehensive predictions of sedimentary facies. However, their methods are often constrained to some kind of unimodal data, and the practicality and generalizability of the resulting models are relatively limited. Therefore, based on the characteristics of oilfield data with multiple heterogeneous sources and the difficulty of complementary fusion between data, this paper proposes a sedimentary facies identification technique with multimodal data fusion, which uses multimodal data from core wells, including logging, physical properties, textual descriptions, and core images, to comprehensively predict the sedimentary facies by adopting decision-level feature fusion after predicting different unimodal data separately. The method was applied to a total of 12 core wells in the northwestern margin of the Junggar Basin, China; good results were obtained, achieving an accuracy of over 90% on both the validation and test sets. Using this method, the sedimentary microfacies of a newly drilled core well can be predicted and the interpretation of the sedimentary framework in the well area can be updated in real-time based on data from newly drilled core wells, significantly improving the efficiency and accuracy of oil and gas exploration and development.