Most real world databases consist of historical and numerical data such as sensor, scientific or even demographic data. In this context, classical algorithms extracting sequential patterns, which are well adapted to the temporal aspect of data, do not allow numerical information processing. Therefore the data are pre-processed to be transformed into a binary representation, which leads to a loss of information. Fuzzy algorithms have been proposed to process numerical data using intervals, particularly fuzzy intervals, but none of these methods is satisfactory. Therefore this paper completely defines the concepts linked to fuzzy sequential pattern mining. Using different fuzzification levels, we propose three methods to mine fuzzy sequential patterns and detail the resulting algorithms (SPEEDYFUZZY, MINIFUZZY and TOTALLYFUZZY). Finally, we assess them through different experiments, thus revealing the robustness and the relevancy of this work.
Industrial databases often contains a large amount of unfilled information. When these data are mined for frequent sequences, incomplete data are, most of the time, deleted, which leads to an important loss of information. Extracted knowledge then becomes less representative of the database. Two techniques can then be investigated: either using only the available information or estimating the missing values. In this paper we propose an estimation-based approach that represents inclusion of an item within a record by a fuzzy set. Then the membership degree giving the item inclusion is used to compute the frequency of a sequence. Experiments run on various synthetic datasets show the feasibility and validity of our proposal as well in terms of quality as in terms of the robustness to the rate of missing values.
Many applications require techniques for temporal knowledge discovery. Some of those approaches can handle time constraints between events. In particular some work has been done to mine generalized sequential patterns. However, such constraints are often too crisp or need a very precise assessment to avoid erroneous information. Therefore, in this paper we propose to soften temporal constraints used for generalized sequential pattern mining. To handle these constraints while data mining, we design an algorithm based on sequence graphs. Moreover, as these relaxed constraints may extract more generalized patterns, we propose temporal accuracy measure for helping the analysis of the numerous discovered patterns.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.