Along the continuous advancement of the network and the rise of digital media, the amount of data produced by the exponential explosion. And how to use these data to provide personalized services for users is one of the current research focuses. To address the issue of insufficient coverage in the current sentiment lexicon and the difficulty of constructing sentiment lexicon in specific fields, this study proposes a multi-modal emotional thesaurus. Semi-supervised learning is used to solve the problem of insufficient coverage of emotional thesaurus, and a semi-supervised classification algorithm is realized by using a large number of unlabeled sample data combined with a small number of labeled sample data. Optimized learning is used to solve the problem of difficult construction of emotional thesaurus in specific fields, the corresponding specific emotional thesaurus is constructed by adaptive adjustment of emotional word score, and finally the improved emotional thesaurus is used to build a digital media short text sentiment analysis framework. For testing, the NLPCC dataset was used in this study, Experiments show that the framework constructed in this study requires 87 iterations, a Recall value of 0.912, a F1 value of 0.753, and an average accuracy of 83.39%, all of which are better than the sentiment analysis framework without the use of multi-pattern sentiment lexicon. In the simulation experiment, the recognition accuracy reached 85.88%, which was 16.85%, 11.57% and 6.72% higher than the test scenarios using a single emotion thesaurus selected in this study. The above results show that the digital media short-text sentiment analysis framework built in this research based on multi-pattern sentiment lexicon can carry out short-text sentiment analysis more accurately and efficiently, so as to accurately analyze users’ needs and provide customized services precisely.