Facial Expression Recognition (FER) systems aims to classify human emotions through facial expression as one of seven basic emotions: happiness, sadness, fear, disgust, anger, surprise and neutral. FER is a very challenging problem due to the subtle differences that exist between its categories. Even though convolutional neural networks (CNN) achieved impressive results in several computer vision tasks, they still do not perform as well in FER. Many techniques, like bilinear pooling and improved bilinear pooling, have been proposed to improve the CNN performance on similar problems. The accuracy enhancement they brought in multiple visual tasks, shows that their is still room for improvement for CNNs on FER. In this paper, we propose to use bilinear and improved bilinear pooling with CNNs for FER. This framework has been evaluated on three well known datasets, namely ExpW, FER2013 and RAF-DB. It has shown that the use of bilinear and improved bilinear pooling with CNNs can enhance the overall accuracy to nearly 3% for FER and achieve state-of-the-art results.