This thesis examines techniques to improve the robustness of automatic speech recognition (ASR) systems against noise distortions. The study is important as the performance of ASR systems degrades dramatically in adverse environments, and hence greatly limits the speech recognition application deployment in realistic environments. Towards this end, we examine a feature compensation approach and a discriminative model training approach to improve the robustness of speech recognition system. The degradation of recognition performance is mainly due to the statistical mismatch between clean-trained acoustical model and noisy testing speech features. To reduce the feature-model mismatch, we propose to normalize the temporal structure of both training and testing speech features. Speech features' temporal structures are represented by the power spectral density (PSD) functions of feature trajectories. We propose to normalize the temporal structures by applying equalizing filters to the feature trajectories. The proposed filter is called temporal structure normalization (TSN) filter. Compared to other temporal filters used in speech recognition, the advantage of the TSN filter is its adaptability to changing environments. The TSN filter can also be viewed as a feature normalization technique that normalizes the PSD function of features, while other normalization methods, such as histogram equalization (HEQ), normalize the probability density function (p.d.f.) of features. Experimental study shows that the TSN filter produces better performance than other state-of-the-art temporal filters on both small vocabulary Aurora-2 task and large vocabulary Aurora-4 task. In the second study, we improve the robustness of speech recognition by improving the generalization capability of acoustic model rather than reducing the feature-model mismatch. In the log likelihood score domain, noise distortion causes the log likelihood score of noisy features to deviate from that of clean features. The deviation may move 6 ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library the noisy features to the wrong side of the decision boundary that is trained from clean features, and hence causes recognition error. To improve performance, discriminative training (DT) methods, including minimum classification error (MCE), maximum mutual information (MMI) and soft-margin estimation (SME), are applied to improve the generalization capability of the acoustic model, which in turn is implemented by increasing the margin, i.e. the desired minimum distance from training samples to the decision boundary. Experimental study shows that by improving the acoustic model's generalization capability with SME and other DT training, speech recognition performance can be improved even when the testing data is mismatched from the training data. IN addition, the margin-based SME is slightly more effective than MCE and MMI in terms of increasing the margin and robustness. It is also observed that DT methods ...