This paper proposes a content-based spam email classification by applying various text preprocessing techniques like Stopping, Stemming, and Lemmatization. NLP techniques have been applied to preprocessing techniques to avoid loss of information while preprocessing. Machine learning algorithms like Naïve Bayes (NB), Support Vector Machine (SVM), and Random Forest (RF) are applied to classify an email as ham or spam. The pre-processing technique used in this model has greatly enhanced the classification results. Two standard datasets Enron and SpamAssassin are used to evaluate the performance of the models. On the Enron dataset, it scored an impressive accuracy of 98.3%, and on the SpamAssassin dataset, it provided an even greater accuracy of 99.2%. These outcomes demonstrate the efficacy of preprocessing techniques for the classification of spam emails. Further, the proposed model's accuracy was validated using a dataset of personal emails sourced from Yahoo mailbox. The Yahoo inbuilt classifier offered an accuracy of 89%, however, the proposed models provided a staggering 97% accuracy on the personal email dataset. The experiment on the personal email dataset indicates the model's suitability for real-world email contexts, indicating its potential effectiveness in spam email categorization.