Automatic speech recognition (ASR) is a vital technology that transforms spoken language into written text, facilitating effective accessibility and communication. Despite the ongoing development of deep learning approaches, speech recognition remains a formidable task, especially for languages with limited data resources, such as Tamil. This work presents the development of an ASR system by utilizing the real-time spontaneous Tamil speech data collected from various types of people's communications in public places. The corpus is trained by fine-tuning the pre-trained wav2vec2 XLSR model. This model captures the diverse acoustic features and patterns and even applied to multiple dialects, making it adaptable to real-world speech. The implemented model is evaluated on various noisy environments like markets, hospitals, shops, etc. In terms of various evaluation metrics such as word error rate (WER) and character error rate (CER), the designed model exhibits an optimal performance by achieving a lower error rate when compared to the baseline ASR models.