The social web appears to enrich human lives by providing effective applications for online social interactions. Microblogs are one of the most important applications of the social Web. The Microbloggers who influence the social community users through their content in the form of tweets are known as the influential microbloggers. The identification of such influential microbloggers has vast applications in advertising, online marketing, corporate communication, information dissemination, etc. This paper investigates the problem of identifying influential microbloggers by proposing MIPPLA (Model to identify Influential using Productivity, Popularity and Link Analysis) model which integrates the modules of Productivity and Popularity. The Productivity module considers a micro-blogger’s activity and the Popularity module identifies a microbloggers influence in an online social community. In addition, we modify the classic PageRank by utilizing the Twitter features such as retweet, mention, and reply for ranking the influential users. The proposed approaches are evaluated using real-world social networks. The results prove that the MIPPLA model efficiently identifies and ranks the top influential users in an effective manner as compared to the existing techniques.
With the emergence of big data and the interest in deriving valuable insights from ever-growing and ever-changing streams of data, machine learning has appeared as an effective data analytic technique as compared to traditional methodologies. Big data has become a source of incredible business value for almost every industry. In this context, machine learning plays an indispensable role of providing smart data analysis capabilities for uncovering hidden patterns. These patterns are later translated into automating certain aspects of the decision-making processes using machine learning classifiers. This paper presents a state-of-the-art comparative analysis of machine learning and deep learning-based classifiers for multiclass prediction. The experimental setup consisted of 11 datasets derived from different domains, publicly available at the repositories of UCI and Kaggle. The classifiers include Naïve Bayes (NB), decision trees (DTs), random forest (RF), gradient boosted decision trees (GBDTs), and deep learning-based convolutional neural networks (CNN). The results prove that the ensemble-based GBDTs outperform other algorithms in terms of accuracy, precision, and recall. RF and CNN show nearly similar performance on most datasets and outperform the traditional NB and DTs. On the other hand, NB shows the lowest performance as compared to other algorithms. It is worth mentioning that DTs show the lowest precision score on the Titanic dataset. One of the main reasons is that DTs suffer from overfitting and use a greedy approach for attribute relationship analysis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.