Given the increasing prevalence of facial analysis technology, the problem of bias in these tools is becoming an even greater source of concern. Causality has been proposed as a method to address the problem of bias, giving rise to the popularity of using counterfactuals as a bias mitigation tool. In this paper, we undertake a systematic investigation of the usage of counterfactuals to achieve both statistical and causal-based fairness in facial expression recognition. We explore bias mitigation strategies with counterfactual data augmentation at the preprocessing, in-processing, and post-processing stages as well as a stacked approach that combines all three methods. At the in-processing stage, we propose using Siamese Networks to suppress the differences between the predictions on the original and the counterfactual images. Our experimental results on RAF-DB with counterfactuals added show that: (1) The in-processing method outperforms at the pre-processing and postprocessing stages, in terms of accuracy, F1 score, statistical fairness and counterfactual fairness, and (2) stacking the pre-processing, in-processing and post-processing stages provides the best performance.