Machine learning models are widely used to decide whether to accept or reject credit loan applications. However, similarly to human‐based decisions, they may discriminate between special groups of applicants, for instance based on age, gender, and race. In this paper, we aim to understand whether machine learning credit lending models are biased in a real case study, that concerns borrowers asking for credits in different regions of the United States. We show how to measure model fairness using different metrics, and we explore the capability of explainable machine learning to add further insights. From a constructive viewpoint, we propose a propensity matching approach that can improve fairness.