Text classification is the process of determining categories or tags of a document depending on its content. Although text classification is a well‐known process, it has many steps that require tuning to improve mathematical models. This article provides a novel methodology and expresses key points to improve text classification performance using learning‐based algorithms and techniques. First, to check the effectiveness of the proposed methodology, we selected two public Turkish news benchmarking datasets. Then, we performed extensive testing using both supervised machine learning algorithms and state‐of‐art pre‐trained language models. The experimental results show that our methodology outperforms previous news classification studies on these benchmarking datasets improving categorization results based on F1‐score. Therefore, we conclude that the presented methodology efficiently improves the classification results and selects the feasible classifier for a given dataset.
Information is spread as individuals engage with other users in the underlying social network. Analysis of social engagements can therefore provide insights to understand the motivation behind how and why users engage with others in different activities. In this study, we aim to understand the driving factors behind four engagement types in Twitter, namely like, reply, retweet, and quote. We extensively analyze a diverse set of features that reflect user behaviors, as well as tweet attributes and semantics by natural language processing, including a deep learning language model, BERT. The performance of these features is assessed in a supervised task of engagement prediction by learning social engagements from over 14 million multilingual tweets. In the light of our experimental results, we find that users would engage with tweets based on text semantics and contents regardless of tweet author, yet popular and trusted authors could be important for reply and quote. Users who actively liked and retweeted in the past are likely to maintain this type of behavior in the future, while this trend is not seen in more complex types of engagements, reply, and quote. Moreover, users do not necessarily follow the behavior of other users with whom they have previously engaged. We further discuss the social insights obtained from the experimental results to understand better user behavior and social engagements in online social networks.
Supplementary Information
The online version contains supplementary material available at 10.1007/s13278-022-00872-1.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.