In this world of information explosion, people require more effective ways to filter useful information from millions of data. Email as one of the most frequently used form of communication, carries important messages, yet along with messages of fake news, misinformation and scams known as spam emails. Manually categorizing them from non-spam emails requires a lot of time and money and other human along with material resources. In order to deal with this, deep learning, or natural language processing models in particular, is introduced to categorize emails faster and cheaper. The Natural Language Processing model used here is called Bidirectional Encoder Representations from Transformers (BERT). Since BERT is already a pre-trained model, the main task is to do the Fine-Tune part on it, with a dataset that contains around 5000 emails (85% spam emails and 15% non-spam ones). After that the model is tested on a group of 5 emails including 3 commercials/spams and 2 non-spam emails. The result shows that this model could separate them by giving commercials scores closer to 1 (spread from 0.5 to 0.7) and non-spam emails scores close to 1(spread from 0 to 0.1). Therefore, it can be concluded that this model works on small sets of data.