A data stream is a massive unbounded sequence of data elements continuously generated at a rapid rate. Consequently, the knowledge embedded in a data stream is more likely to be changed as time goes by. Identifying the recent change of a data stream, specially for an online data stream, can provide valuable information for the analysis of the data stream. In addition, monitoring the continuous variation of a data stream enables to find the gradual change of embedded knowledge. However, most of mining algorithms over a data stream do not differentiate the information of recently generated transactions from the obsolete information of old transactions which may be no longer useful or possibly invalid at present. This paper proposes a data mining method for finding recent frequent itemsets adaptively over an online data stream. The effect of old transactions on the mining result of the data steam is diminished by decaying the old occurrences of each itemset as time goes by. Furthermore, several optimization techniques are devised to minimize processing time as well as main memory usage. Finally, the proposed method is analyzed by a series of experiments.
Knowledge embedded in a data stream is likely to be changed as time goes by. Identifying the recent change of the knowledge quickly can provide valuable information for the analysis of the data stream. However, most mining algorithms over a data stream are not able to extract the recent change of knowledge in a data stream adaptively. This is because the obsolete information of old data elements which may be no longer useful or possibly invalid at present is regarded as being as important as that of recent data elements. This paper proposes a sliding window method that finds recently frequent itemsets over a transactional online data stream adaptively. The size of a sliding window defines the desired lifetime of information in a newly generated transaction. Consequently, only recently generated transactions in the range of the window are considered to find the recently frequent itemsets of a data stream.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.