Nowadays, social media platforms are the primary source of public communication and information. Social media platforms have become an integral part of our daily lives, and their user base is rapidly expanding as access is extended to more remote locations. Pakistan has around 71.70 million social media users that utilize Roman Urdu to communicate. With these improvements and the increasing number of users, there has been an increase in digital bullying, often known as cyberbullying. This research focuses on social media users who use Roman Urdu (Urdu language written in the English alphabet) to communicate. In this research, we explored the topic of cyberbullying actions on the Twitter platform, where users employ Roman Urdu as a medium of communication. To our knowledge, this is one of the very few studies that address cyberbullying behavior in Roman Urdu. Our proposed study aims to identify a suitable model for classifying cyberbullying behavior in Roman Urdu. To begin, the dataset was designed by extracting data from twitter using twitter's API. The targeted data was extracted using keywords based on Roman Urdu. The data was then annotated as bully and not-bully. After that, the dataset has been pre-processed to reduce noise, which includes punctuation, stop words, null entries, and duplication removal. Following that, features are extracted using two different methods, Count-Vectorizer and TF-IDF Vectorizer, and a set of ten different learning algorithms including SVM, MLP, and KNN was applied to both types of extracted features based on supervised learning. Support Vector Machine (SVM) performed the best out of the implemented algorithms by both combinations, with 97.8 percent when implemented over the TF-IDF features and 93.4 percent when implemented over the CV features. The proposed mechanism could be helpful for online social apps and chat rooms for the better detection and designing of bully word filters, making safer cyberspace for end users.
The quick advancement of the internet facility and rapid uptake of social networking sites like Twitter and Facebook has led to the generation of an extensive amount of data never seen previously in the history of humanity. Users are producing and disseminating more content than ever because of the widespread use of digital platforms, some of which are false and spreading wrong narratives. Accurate categorization of textual documents as fabrication or falsehood is a complicated job. The majority of the research emphasizes certain databases or topics, most notably the area of politics. As a result, the trained approaches perform effectively on a specific category of documents field and do not generate robust performance when evaluated to articles from other areas. We proposed a solution to the false news identification task in the presented study by combining NLP and machine learning technologies. In the first phase, the data is pre-processed by employing steps like removing null, duplicate, values, and punctuation marks. After this, the clean data is converted into numeric representation using a count vectorizer (CV) and Tf-Idf vectorizer. While for the classification task, we have used the SVM classifier. Our proposed solution performed well and catered to all possible fake news domains.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.