Social interactions have changed in recent years. People post their thoughts, opinions and sentiments on social media platforms more often, through images and videos, providing a very rich source of data about population of different countries, communities, etc. Due to the increase in the amount of data on the internet, it becomes impossible to perform any analysis in a manual manner, requiring the automation of the process. In this work, we use two blog corpora that contain images and texts. Cross-Media German Blog (CGB) corpus consists of German blog posts, while Cross-Media Brazilian Blog (CBB) contains Brazilian blog posts. Both blogs have the Ground Truth (GT) of images and texts feelings (sentiments), classified according to human perceptions. In previous work, Machine Learning and lexicons technologies were applied to both corpora to detect the sentiments (negative, neutral or positive) of images and texts and compare the results with ground truth (based on subjects perception). In this work, we investigated a new hypothesis, by detecting faces and their emotions, to improve the sentiment classification accuracy in both CBB and CGB datasets. We use two methodologies to detect polarity on the faces and evaluated the results with the images GT and the multimodal GT (the complete blog using text and image). Our results indicate that the facial emotion can be a relevant feature in the classification of blogs sentiment.