Online hate speech is one of the negative impacts of internet-based social media development. Hate speech occurs due to a lack of public understanding of criticism and hate speech. The Indonesian government has regulations regarding hate speech, and most of the existing research about hate speech only focuses on feature extraction and classification methods. Therefore, this paper proposes methods to identify hate speech before a crime occurs. This paper presents an approach to detect hate speech by expanding synonyms in word embedding and shows the classification comparison result between Word2Vec and FastText with bidirectional long short-term memory which are processed using synonym expanding process and without it. The goal is to classify hate speech and non-hate speech. The best accuracy result without the synonym expanding process is 0.90, and the expanding synonym process is 0.93.
Background: Twitter is one of the most used social media, with 310 million active users monthly and 500 million tweets per day. Twitter is not only used to talk about trending topics but also to share information about accidents, fires, traffic jams, etc. People often find these updates useful to minimize the impact.
Objective: The current study compares the effectiveness of three deep learning methods (CNN, RCNN, CLSTM) combined with neuroNER in classifying multi-label incidents.
Methods: NeuroNER is paired with different deep learning classification methods (CNN, RCNN, CLSTM).
Results: CNN paired with NeuroNER yield the best results for multi-label classification compared to CLSTM and RCNN.
Conclusion: CNN was proven to be more effective with an average precision value of 88.54% for multi-label incidents classification. This is because the data we used for the classification resulted from NER, which was in the form of entity labels. CNN immediately distinguishes important information, namely the NER labels. CLSTM generates the worst result because it is more suitable for sequential data. Future research will benefit from changing the classification parameters and test scenarios on a different number of labels with more diverse data.
Keywords: CLSTM, CNN, Incident Classification, Multi-label Classification, RCNN
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.