Automatic Speech Recognition (ASR) systems have seen substantial improvements in the past decade; however, not for all speaker groups. Recent research shows that bias exists against different types of speech, including non-native accents, in state-of-the-art (SOTA) ASR systems. To attain inclusive speech recognition, i.e., ASR for everyone irrespective of how one speaks or the accent one has, bias mitigation is essential and necessary. In this thesis, two SOTA ASR systems (one is based on the recurrent neural network (RNN) and the other is based on the transformer architecture) are built to uncover and quantify the bias against non-native accents. Here I focus on bias mitigation against non-native accents using two different approaches: data augmentation and by using more effective training methods. For data augmentation, an autoencoder-based cross-lingual voice conversion (VC) model is used to increase the amount of non-native accented speech training data in addition to data augmentation through speed perturbation. Moreover, I investigate two training methods, i.e., fine-tuning and Domain Adversarial Training (DAT), to see whether they can utilize the available non-native accented speech data more effectively than a standard training approach. Experimental results show for the transformer-based ASR model: (1) adding VC-generated and speed-perturbed data to train the ASR model gives the best bias mitigation performance and the lowest word error rate (WER); (2) fine-tuning reduces the bias against non-native accents but at the cost of native accent performance; and (3) compared with the standard training method, DAT does not leads to further bias reduction. While for the RNN-based ASR model, all the 4 bias mitigation approaches do not show obvious benefits.