This work aims to propose an approach for detecting novelties, taking into account the temporal flow of data streams in social media. To this end, we present a completely new architecture for novelty detection. This new architecture entails three new contributions. First, we propose a new concept for novelty definition based on temporal windows. Second, we formulate an expression to determine the quality of a novelty. Third, we introduce a new approach to the fusion of heterogeneous data (image + text), using the COCO dataset and the MASK-RCNN convolutional neural network, which transforms image and text from social media into a single data format ready to be identified by machine learning algorithms. Since novelty detection is a task in which labeled samples are scarce or inexistent, unsupervised algorithms are used, and thus, the following baseline and state-of-the-art algorithms have been chosen: kNN, HBOS, FBagging, IForesting, and autoencoders. The new fusion approach is also compared to a state-of-the-art approach to outlier detection named AOM. Because of temporal particularities and the data types being fused, a new dataset was created, containing 27,494 tweets collected from Twitter. Our experiments show that data classification of social media using data fusion is superior to using only text or only images as input data.
Resumo As tarefas de resolver perguntas ou esclarecer dúvidas são determinadas primeiramente por uma boa análise da pergunta com o fim de identificar o assunto
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.