In recent years, predictive models have been increasingly used by education practitioners and stakeholders to leverage actionable insights to support student success. Usually, model selection (i.e., the decision of which predictive model to use) is largely based on the predictive performance of the models. Nevertheless, it has become important to consider fairness as an integral part of the criteria for model selection. Might a model be unfair towards certain demographic groups? Might it systematically perform poorly for certain demographic groups? Indeed, prior studies affirm this. Which model out of the lot should we choose then? Additionally, prior studies suggest demographic group imbalance in the training dataset to be a source of such unfairness. If so, would the fairness of the predictive models improve if the demographic group distribution in the training dataset becomes balanced? This study seeks to answer these questions. Firstly, we analyze the fairness of 4 of the commonly used state-of-the-art models to predict course success for 3 IT courses in a large public Australian university. Specifically, we investigate if the models serve different demographic groups equally. Secondly, to address the identified unfairness—supposedly caused by the demographic group imbalance—we train the models on 3 types of \textit{balanced data} and investigate again if the unfairness was mitigated. We found that none of the predictive models was consistently fair in all 3 courses. This suggests that model selection decision should be carefully made by both the researchers and stakeholders as the per the requirement of the domain of application. Furthermore, we found that balancing demographic groups (and class labels) is not enough—albeit can be an initial step—to ensure fairness of predictive models in education. An implication of this is that sometimes, the source of unfairness may not be immediately apparent. Therefore, “blindly” attributing the unfairness to demographic group imbalance may cause the unfairness to persist even when the data becomes balanced. We hope that our findings can guide practitioners and relevant stakeholders in making well-informed decisions