This paper describes the work of identifying the presence of offensive language in social media posts and categorizing a post as targeted to a particular person or not. The work developed by team TECHSSN for solving the Multilingual Offensive Language Identification in Social Media (Task 12) in SemEval-2020 involves the use of deep learning models with BERT embeddings. The dataset is preprocessed and given to a Bidirectional Encoder Representations from Transformers (BERT) model with pretrained weight vectors. The model is retrained and the weights are learned for the offensive language dataset. We have developed a system with the English language dataset. The results are better when compared to the model we developed in SemEval-2019 Task6.
Related WorkRecently, we have seen great strides in the research on profanity speech detection in social media, which includes hate speech detection, offensive language identification, and abusive language detection. Several workshops such as GermEval, SemEval, HatEval, and TRAC gain attention of the researchers in this field. Research in hate speech includes work done by Basile et al. (2019), Fortuna andNunes (2018), Malmasi and Zampieri (2017). Difference between profanity and hate speech, and the challenges involved are discussed in . An offensive language detection system is desribed out by This work is licensed under a Creative Commons Attribution 4.0 International License. License details: http:// creativecommons.org/licenses/by/4.0/.