2012
DOI: 10.2478/v10177-012-0044-0
|View full text |Cite
|
Sign up to set email alerts
|

Probabilistic Sequence Mining – Evaluation and Extension of ProMFS Algorithm for Real-Time Problems

Abstract: Sequential pattern mining is an extensively studied method for data mining. One of new and less documented approaches is estimation of statistical characteristics of sequence for creating model sequences, that can be used to speed up the process of sequence mining. This paper proposes extensive modifications to one of such algorithms, ProMFS (probabilistic algorithm for mining frequent sequences), which notably increases algorithm’s processing speed by a significant reduction of its computational complexity. A… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2012
2012
2013
2013

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 11 publications
0
2
0
Order By: Relevance
“…FP-growth algorithm avoids constant database scans by creating the tree in the first step and then mining smaller search-space using divide-and-conquer methods. Apriori GSP SPARSE SPADE SPAM LAPIN King James 1% 69994 5418 5645 5214 6781 5514 Bible 0,03% 86112 6519 6982 6706 9400 6618 Pan Tadeusz 2% 2804 422 437 401 363 428 0,02% 4736 647 652 647 580 619 Connect 1% 64658 4519 3850 4421 4536 4028 Chess 0,4% 182 29 23 25 31 27 Pumsb* 0,5% 147208 8143 6415 6999 8548 6330 Sequential pattern mining algorithms are proven to work well and with satisfying speed for different types of data sets [6]- [8], but for large data sets tend to be too slow to be used for real-time solutions [9], [10]. Tests of popular CPU-based algorithms performed on i7 CPU 960@3.2GHz with 32GB RAM under MS VS 2010 were made.…”
Section: Introductionmentioning
confidence: 99%
“…FP-growth algorithm avoids constant database scans by creating the tree in the first step and then mining smaller search-space using divide-and-conquer methods. Apriori GSP SPARSE SPADE SPAM LAPIN King James 1% 69994 5418 5645 5214 6781 5514 Bible 0,03% 86112 6519 6982 6706 9400 6618 Pan Tadeusz 2% 2804 422 437 401 363 428 0,02% 4736 647 652 647 580 619 Connect 1% 64658 4519 3850 4421 4536 4028 Chess 0,4% 182 29 23 25 31 27 Pumsb* 0,5% 147208 8143 6415 6999 8548 6330 Sequential pattern mining algorithms are proven to work well and with satisfying speed for different types of data sets [6]- [8], but for large data sets tend to be too slow to be used for real-time solutions [9], [10]. Tests of popular CPU-based algorithms performed on i7 CPU 960@3.2GHz with 32GB RAM under MS VS 2010 were made.…”
Section: Introductionmentioning
confidence: 99%
“…GSP algorithm is proven to work well and with satisfying speed for different types of data sets [3]- [5], but as shown in Table I for large data sets the classical GSP algorithm is too slow to be used for real-time solutions. Analogous case is for other sequential pattern mining algorithms [6].…”
Section: Introductionmentioning
confidence: 99%