2019
DOI: 10.1007/s10115-019-01393-8
|View full text |Cite
|
Sign up to set email alerts
|

ProSecCo: progressive sequence mining with convergence guarantees

Abstract: We present ProSecCo, an algorithm for the progressive mining of frequent sequences from large transactional datasets: it processes the dataset in blocks and it outputs, after having analyzed each block, a high-quality approximation of the collection of frequent sequences. ProSecCo can be used for interactive data exploration, as the intermediate results enable the user to make informed decisions as the computation proceeds. These intermediate results have strong probabilistic approximation guarantees and the f… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
31
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
2

Relationship

2
4

Authors

Journals

citations
Cited by 11 publications
(31 citation statements)
references
References 35 publications
0
31
0
Order By: Relevance
“…The SPMF implementation of these algorithms is considered to be of state-of-the-art quality and it is widely used in testing pattern mining algorithms (see, e.g., the Citations page on the SPMF webpage). In the preliminary version of this work [28] we used home-grown C # implementations of PrefixSpan and ProSecCo, and the memory usage patterns of the two algorithms were almost identical to those that we report here.…”
Section: Memory Usagementioning
confidence: 88%
See 1 more Smart Citation
“…The SPMF implementation of these algorithms is considered to be of state-of-the-art quality and it is widely used in testing pattern mining algorithms (see, e.g., the Citations page on the SPMF webpage). In the preliminary version of this work [28] we used home-grown C # implementations of PrefixSpan and ProSecCo, and the memory usage patterns of the two algorithms were almost identical to those that we report here.…”
Section: Memory Usagementioning
confidence: 88%
“…This version of our work differs in many ways from the preliminary one that appeared in the proceedings of IEEE ICDM'18 [28]. The major changes are the following, listed approximately in order of importance:…”
Section: Related Workmentioning
confidence: 99%
“…The first meaning is sample as a small random sample of a large dataset: since mining patterns becomes more expensive as the dataset grows, it is reasonable to mine only a small random sample that fits into the main memory of the machine. Recently, this meaning of sample as "sample-of-the-dataset" has been used also to enable interactive data exploration using progressive algorithms for pattern mining [22]. The patterns obtained from the sample are an approximation of the exact collection, due to the noise introduced by the sampling process.…”
Section: Introductionmentioning
confidence: 99%
“…To obtain desirable probabilistic guarantees on the quality of the approximation, one must study the trade-off between the size of the sample and the quality of the approximation. Many works have progressively obtained better characterizations of the trade-off using advanced probabilistic concepts [7,17,18,20,22,26]. Recent methods [17,18,20,22] use VC-dimension, pseudodimension, and Rademacher averages [4,14], key concepts from statistical learning theory [28] (see also Sect.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation