A social-media user portrait is an important means of improving the quality of an Internet information service. Current user profiling methods do not discriminate the emotional differences of users of different genders and ages on social media against a background of multi-modality and a lack of domain sentiment labels. This paper adopts the sentiment analysis of images and text to improve label classification, incorporating gender and age differences in the sentiment analysis of multi-modal social-media user profiles. In the absence of domain sentiment labels, instance transfer learning technology is used to express the learning method with the sentiment of text and images; the semantic association learning of multi-modal data of graphics and text is realized; and a multi-modal attention mechanism is introduced to establish the hidden image and text. Alignment relationships are used to address the semantic and modal gaps between modalities. A multi-modal user portrait label classification model (MPCM) is constructed. In an analysis of the sentiment data of User users on Facebook, Twitter, and News, the MPCM method is compared with the naive Bayes, Latent Dirichlet Allocation, Tweet-LDA and LUBD-CM(3) methods in terms of accuracy, precision, recall and the FL-score. At a 95% confidence, the performance is improved by 1% to 4% by using the MPCM method.