Aiming at the fact that traditional sentiment analysis is based on text, without considering the factors such as special symbols and emoticon images, which can't fully extract the user's emotions, this paper proposes a sentiment analysis method of online homestay reviews based on image-text fusion. For text datasets, first use Word2vec to build a topic clustering model, then find the corresponding topic attribute dictionary through the topic center words, use Bayesian classifier is used for sentiment analysis, compared with SVM and decision tree methods, to evaluate the effect; For the picture dataset, Convolutional Neural Network (CNN) model is initialized by parameter migration, and image sentiment classification model is obtained by fine-tuning training of CNN model after parameter migration; Finally, the fusion method is designed to calculate the emotional probability of image-text, then judge the emotional polarity and compare it with the user's score, The accuracy rate is 88.6%, which is higher than that of text sentiment analysis model or image sentiment analysis model. The experimental results show that the sentiment analysis of image-text fusion has better classification effect on image-text reviews and more effectively avoid the problem of inconsistency between user ratings and the emotion expressed in comments.