In this study, we propose dynamic model update methods for the adaptive classification model of text streams in a distributed learning environment. In particular, we present two model update strategies: (1) the entire model update and (2) the partial model update. The former aims to maximize the model accuracy by periodically rebuilding the model based on the accumulated datasets including recent datasets. Its learning time incrementally increases as the datasets increase, but we alleviate the learning overhead by the distributed learning of the model. The latter fine-tunes the model only with a limited number of recent datasets, noting that the data streams are dependent on a recent event. Therefore, it accelerates the learning speed while maintaining a certain level of accuracy. To verify the proposed update strategies, we extensively apply them to not only fully trainable language models based on CNN, RNN, and Bi-LSTM, but also a pre-trained embedding model based on BERT. Through extensive experiments using two real tweet streaming datasets, we show that the entire model update improves the classification accuracy of the pre-trained offline model; the partial model update also improves it, which shows comparable accuracy with the entire model update, while significantly increasing the learning speed. We also validate the scalability of the proposed distributed learning architecture by showing that the model learning and inference time decrease as the number of worker nodes increases.
In this paper, we deal with the problem of judging the credibility of movie reviews. The problem is challenging because even experts cannot clearly and efficiently judge the credibility of a movie review and the number of movie reviews is very large. To attack this problem, we propose a weakly supervised learning method for fast annotation. In terms of predefined criteria for weakly supervised learning, we present a simple and clear criterion based on historical movie ratings associated with movie reviewers. The proposed method has the following two advantages. First, it is significantly efficient because we can annotate the entire data sets according to the predefined rule. Indeed, we show that the proposed method can annotate 8,000 movie reviews only in 0.712 seconds. Second, a criterion adapted for weakly supervised learning is simple but effective. We use as a comparison learning method that uses the helpfulness votes of other reviewers as the criterion to judge the credibility of movie reviews, which has been widely used to judge the credibility of online reviews.We indicate that the proposed learning method is comparable to or even better than the helpfulness vote method by showing an improvement over the accuracy of the latter method of 1.57% ∼ 4.54%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.