Time series joins, motifs, discords and shapelets: a unifying view that exploits the matrix profile

Yeh, Chin‐Chia Michael; Zhu, Yada; Ulanova, Liudmila; Begum, Nurjahan; Ding, Yifei; Dau, Son Hoang; Zimmerman, Zachary; Silva, Diego Furtado; Mueen, Abdullah; Keogh, Eamonn

doi:10.1007/s10618-017-0519-9

Cited by 106 publications

(83 citation statements)

References 46 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The values are picked from a Gaussian distribution with mean value and standard deviation randomly selected from [−5, 5] and [0, 2] respectively; • Mixed sine. It is a mixture of several sine waves whose period, amplitude and mean value are randomly chosen from [2,10], [2,10] and [−5, 5] respectively.…”

Section: Methodsmentioning

confidence: 99%

“…Furthermore, to verify the universality of this new query type, we investigate the motif pairs in some popular real-world time series benchmarks. Motif mining [2] is an important time series mining task, which finds a pair (or set) of subsequences with minimal normalized distance. For a motif subsequence pair, say X and Y , we show the relative mean value difference (∆Mean= |µ X −µ Y | max − min ) and the ratio of standard deviation (∆Std= | σ X σ Y |) in Fig.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

KV-Match: A Subsequence Matching Approach Supporting Normalization and Time Warping

Wang

Pan

et al. 2019

2019 IEEE 35th International Conference on Data Engineering (ICDE)

View full text Add to dashboard Cite

The volume of time series data has exploded due to the popularity of new applications, such as data center management and IoT. Subsequence matching is a fundamental task in mining time series data. All index-based approaches only consider raw subsequence matching (RSM) and do not support subsequence normalization. UCR Suite can deal with normalized subsequence matching problem (NSM), but it needs to scan full time series. In this paper, we propose a novel problem, named constrained normalized subsequence matching problem (cNSM), which adds some constraints to NSM problem. The cNSM problem provides a knob to flexibly control the degree of offset shifting and amplitude scaling, which enables users to build the index to process the query. We propose a new index structure, KV-index, and the matching algorithm, KV-match. With a single index, our approach can support both RSM and cNSM problems under either ED or DTW distance. KV-index is a key-value structure, which can be easily implemented on local files or HBase tables. To support the query of arbitrary lengths, we extend KV-match to KV-matchDP, which utilizes multiple varied-length indexes to process the query. We conduct extensive experiments on synthetic and real-world datasets. The results verify the effectiveness and efficiency of our approach.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

KV-Match: A Subsequence Matching Approach Supporting Normalization and Time Warping

Wang

Pan

et al. 2019

2019 IEEE 35th International Conference on Data Engineering (ICDE)

View full text Add to dashboard Cite

show abstract

“…MASS. MASS [87] is an exact subsequence matching algorithm, which computes the distance between a query, SQ, and every subsequence in the series, using the dot product of the DFT transforms of the series and the reverse of SQ.…”

Section: Similarity Search Methodsmentioning

confidence: 99%

“…Acknowledgments. We sincerely thank all authors for generously sharing their code, and M. Linardi for his implementation of MASS [87]. Work partially supported by EU project NESTOR (Marie Curie #748945).…”

mentioning

confidence: 99%

The lernaean hydra of data series similarity search

et al. 2018

View full text Add to dashboard Cite

Increasingly large data series collections are becoming commonplace across many different domains and applications.A key operation in the analysis of data series collections is similarity search, which has attracted lots of attention and effort over the past two decades. Even though several relevant approaches have been proposed in the literature, none of the existing studies provides a detailed evaluation against the available alternatives. The lack of comparative results is further exacerbated by the non-standard use of terminology, which has led to confusion and misconceptions. In this paper, we provide definitions for the different flavors of similarity search that have been studied in the past, and present the first systematic experimental evaluation of the efficiency of data series similarity search techniques. Based on the experimental results, we describe the strengths and weaknesses of each approach and give recommendations for the best approach to use under typical use cases. Finally, by identifying the shortcomings of each method, our findings lay the ground for solid further developments in the field. PVLDB Reference Format:the whole (not a sub-) sequence. This problem represents a common use case across many domains [1, 2,38,29]. This work is the most extensive experimental comparison of the efficiency of similarity search methods ever conducted. Contributions. We make the following contributions:1. We present a thorough discussion of the data series similarity search problem, formally defining its different variations that have been studied in the literature under diverse and conflicting names. Thus, establishing a common language that will facilitate further work in this area.2. We include a brief survey of data series similarity search approaches, bringing together studies presented in different communities that have been treated in isolation from each other. These approaches range from smart serial scan methods to the use of indexing, and are based on a variety of classic and specialized data summarization techniques.3. We make sure that all approaches are evaluated under the same conditions, so as to guard against implementation bias. To this effect, we used implementations in C/C++ for all approaches, and reimplemented in C the ones that were only available in other programming languages. Moreover, we conducted a careful inspection of the code bases, and applied to all of them the same set of optimizations (e.g., with respect to memory management, Euclidean distance calculation, etc.), leading to considerably faster performance.4. We conduct the first comprehensive experimental evaluation of the efficiency of data series similarity search approaches, using several synthetic and 4 real datasets from diverse domains. In addition, we report the first large scale experiments with carefully crafted query workloads that include queries of varying difficulty, which can effectively stress-test all the approaches. Our results reveal characteristics that have not been reported in the literature, and lead...

show abstract

“…A twofold improvement in performance compared to SBF was offered by Quick-Motif [16] with preference shifting towards a deterministic approach to motif discovery. More recently still, performance improvements and increased scalability have been achieved through a series of algorithms based on approximation for the Matrix Profile technique: (examples include STAMP [24], STOMP [25] & VALMOD [26]).…”

Section: Motif Discovery Techniques: Summarymentioning

confidence: 99%

Financial Time Series: Motif Discovery and Analysis Using VALMOD

Cartwright

Crane

Ruskin

2019

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Motif discovery and analysis in time series data-sets have a wide-range of applications from genomics to finance. In consequence, development and critical evaluation of these algorithms is required with the focus not just detection but rather evaluation and interpretation of overall significance. Our focus here is the specific algorithm, VALMOD , but algorithms in wide use for motif discovery are summarised and briefly compared, as well as typical evaluation methods with strengths. Additionally, Taxonomy diagrams for motif discovery and evaluation techniques are constructed to illustrate the relationship between different approaches as well as inter-dependencies. Finally evaluation measures based upon results obtained from VALMOD analysis of a GBP-USD foreign exchange (F/X) rate data-set are presented, in illustration.

show abstract

Time series joins, motifs, discords and shapelets: a unifying view that exploits the matrix profile

Cited by 106 publications

References 46 publications

KV-Match: A Subsequence Matching Approach Supporting Normalization and Time Warping

KV-Match: A Subsequence Matching Approach Supporting Normalization and Time Warping

The lernaean hydra of data series similarity search

Financial Time Series: Motif Discovery and Analysis Using VALMOD

Contact Info

Product

Resources

About