Owing to technological development, the internet has become the world's largest platform where an unaccountable amount of e-news information is freely available to use. Most of the time, e-newspaper readers have to examine the massive collection of e-news articles to locate necessary information relevant to them. Massive semi-structured and unstructured texts usually mislead the readers when they search and understand data for some knowledge. Furthermore, manually reading a collection of e-news articles for some knowledge is tedious and unproductive. The literature related to Knowledge Discovery from text documents has had a substantial improvement in this regard and Association Rule Extraction using text documents, in particular, has become a more frequent and imperative research approach to finding out the most significant information, patterns, and features in the text documents while diminishing the time for reading all the documents. This study provides a comprehensive review of Association Rule extraction using textual data covering the essential topics; Pre-processing, steps in Association Rule Mining, and rule mining algorithms. Out of the various existing association rule mining algorithms, the two most important algorithms, Apriori and FP Growth, are chosen for the experiment using e-news articles. Based on the experimental results, this study discusses the performance, significant bottlenecks, recent breakthroughs of rule mining algorithms, and finally the perspective directions to facilitate future research.
Tea is an age old industry in Sri Lanka which has been highly regulated via Ceylon Tea Board. Country's tea business has deep roots and is among the major economic contributors. Tea brokers facilitate the weekly tea auction in Colombo year round and perform the role of advisors to the factories in terms of predicting demand of different varieties of tea. However these predictions are not planned well and are run by the instinct of the brokers. This research is focused on extracting seasonal demands from tea auction history records to better facilitate the brokers predict demand of upcoming sales. Initial exploration on the dataset gives better insights to tea auction. This closeness, combined with the domain knowledge of the industry is then applied on the sample dataset using the association rule mining technique. The outcomes reveal information hidden in the tea auction dataset which could not be extracted using traditional analysis and display promising results. Findings suggest that selected manufacturing factories display similarities in some sale months in terms of the price range of the auctioned teas providing clues to possible seasonal demands in the dataset. Most of all, it unveils the tea auction dataset for possibilities of further pattern discoveries of future tea demands.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.