Crime prevention relies on crime prediction as a crucial method to determine the most effective patrol strategy for law enforcement agencies. Various approaches and solutions have been utilized to predict criminal activity. Nonetheless the environment and nature of information for crime prediction is constantly changing. Although, potentially a useful source for gathering sentiments, social media content has been ignored by the prediction models. The utilization of social media for sharing information and ideas has experienced a significant surge. Twitter, in particular, is regarded as a valuable platform for gathering public sentiments, emotions, perspectives, and feedback. In this regard, techniques for analyzing the sentiment of tweets on Twitter have been developed to ascertain whether the textual content conveys a positive or negative viewpoint on crime incident. Therefore, our interest lies in investigating the potential and advantages of fusing the information of sentiment and crime modalities. In this paper, ConvBiLSTM is applied to train the model, features of both tweet and crime modalities were extracted independently at vector level and fused into a single representation that captures the information from all modalities. This study involved collecting and conducting experiments using two datasets. The first dataset consisted of crime incident data obtained from the Chicago police department, specifically covering the period between September 1 and September 30, 2019. The second dataset comprised tweets containing crime-related terminology specific to Chicago. The crime prediction using multimodal data fusion on ConvBiLSTM outperform against other models with 97.75% of accuracy.