Collecting valid information from electronic sources to detect the potential outbreak of infectious disease is time-consuming and labor-intensive. The automated identification of relevant information using machine learning is necessary to respond to a potential disease outbreak. A total of 2864 documents were collected from various websites and subsequently manually categorized and labeled by two reviewers. Accurate labels for the training and test data were provided based on a reviewer consensus. Two machine learning algorithms—ConvNet and bidirectional long short-term memory (BiLSTM)—and two classification methods—DocClass and SenClass—were used for classifying the documents. The precision, recall, F1, accuracy, and area under the curve were measured to evaluate the performance of each model. ConvNet yielded higher average, min, and max accuracies (87.6%, 85.2%, and 91.1%, respectively) than BiLSTM with DocClass, while BiLSTM performed better than ConvNet with SenClass with average, min, and max accuracies of 92.8%, 92.6%, and 93.3%, respectively. The performance of BiLSTM with SenClass yielded an overall accuracy of 92.9% in classifying infectious disease occurrences. Machine learning had a compatible performance with a human expert given a particular text extraction system. This study suggests that analyzing information from the website using machine learning can achieve significant accuracies in the presence of abundant articles/documents.
With the development of the Internet of things (IoT), both types and amounts of spatial data collected from heterogeneous IoT devices are increasing. The increased spatial data are being actively utilized in the data mining field. The existing association rule mining algorithms find all items with high correlation in the entire data. Association rules that may appear differently for each region, however, may not be found when the association rules are searched for all data. In this paper, we propose region-based frequent pattern growth (RFP-Growth) to search for association rules by dense regions. First, RFP-Growth divides item transaction included position data into regions by a density-based clustering algorithm. Second, frequent pattern growth (FP-Growth) is performed for each transaction divided by region. The experimental results show that RFP-Growth discovers new association rules that the original FP-Growth cannot find in the whole data.
With the growth of artificial intelligence technology, the importance of recommender systems that recommend personalized content has increased. The general form of the recommender system usually analyzes the users' log information to provide them with contents that they are interested in. However, to enable users to receive more suitable and personalized content, additional information must be considered besides the user's log information. We develop, in the present study, a hybrid recommender system that unifies similarity models-collaborative and content-based-with Markov chains for a sequential recommendation (called U2CMS). U2CMS takes into account both sequential patterns and information about contents to find accurate relationships between items. It uses a higher-order Markov chain to model sequential patterns over several time steps, as well as the textual information of the content to model the recommender system. To show the effectiveness of the U2CMS-with regard to handling sparsity issues, different Nordered Markov Chain, and accurately identifying similarities between items, we carried out several experiments on various Amazon datasets. Our results show that the U2CMS not only has superior performance compared to existing state-of-the-art recommendation systems (including deep-learning based systems), but also it successfully handles sparsity issues better than other approaches. Moreover, U2CMS appears to perform stable when it comes to different N-ordered Markov Chain. Lastly, through visualization, we show the success of our proposed content-based filtering model in identifying similar items. INDEX TERMS Item Similarity Model, Content-based Filtering, Hybrid Recommendation, Sequential Recommendation HONG-JUN JANG received the B.Sc. and Ph.D. degrees in computer science education from Korea University, Seoul, South Korea. His research interests include data mining, machine learning, and data base.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.