Finding Time Series Motifs in Disk-Resident Data

Mueen, Abdullah; Keogh, Eamonn; Bigdely-Shamlo, Nima

doi:10.1109/icdm.2009.15

Cited by 38 publications

(41 citation statements)

References 26 publications

(59 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Subsequence similarity search, the task of finding a region of much longer time series that matches a specified query time series within a given threshold, is a fundamental subroutine in many higher level data mining tasks such as motif discovery [19], anomaly detection [4], association discovery, and classification [20][1] [33].…”

Section: Introductionmentioning

confidence: 99%

“…To consider one concrete example, time series motif discovery is a useful tool with applications in dozens of domains. A recent paper introduced a technique to find motifs in datasets containing millions of objects in just hours, a significant speed-up [19]. This method explicitly assumes the Euclidean Distance; however, for the related problem of classification, it is wellknown that DTW is significantly more accurate [7][25] [33].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Accelerating Dynamic Time Warping Subsequence Search with GPUs and FPGAs

Sart

Mueen

Najjar

et al. 2010

2010 IEEE International Conference on Data Mining

Self Cite

110

View full text Add to dashboard Cite

Abstract-Many time series data mining problems require subsequence similarity search as a subroutine. While this can be performed with any distance measure, and dozens of distance measures have been proposed in the last decade, there is increasing evidence that Dynamic Time Warping (DTW) is the best measure across a wide range of domains. Given DTW's usefulness and ubiquity, there has been a large community-wide effort to mitigate its relative lethargy. Proposed speedup techniques include early abandoning strategies, lower-bound based pruning, indexing and embedding. In this work we argue that we are now close to exhausting all possible speedup from software, and that we must turn to hardware-based solutions if we are to tackle the many problems that are currently untenable even with stateof-the-art algorithms running on high-end desktops. With this motivation, we investigate both GPU (Graphics Processing Unit) and FPGA (Field Programmable Gate Array) based acceleration of subsequence similarity search under the DTW measure. As we shall show, our novel algorithms allow GPUs, which are typically bundled with standard desktops, to achieve two orders of magnitude speedup. For problem domains which require even greater scale up, we show that FPGAs costing just a few thousand dollars can be used to produce four orders of magnitude speedup. We conduct detailed case studies on the classification of astronomical observations and similarity search in commercial agriculture, and demonstrate that our ideas allow us to tackle problems that would be simply untenable otherwise.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Accelerating Dynamic Time Warping Subsequence Search with GPUs and FPGAs

Sart

Mueen

Najjar

et al. 2010

2010 IEEE International Conference on Data Mining

Self Cite

110

View full text Add to dashboard Cite

show abstract

“…The approach of [9] is the first tractable exact motif discovery algorithm based on the combination of early abandoning the Euclidean distance calculation and a heuristic search guided by the linear ordering of data. The authors also introduced for the first time a disk-aware algorithm for exact motif discovery for massive disk-resident datasets [11]. Although there has been significant research effort spent on efficiently discovering time series motifs, most of the literature has focused on fast and scalable approximate or exact algorithms for finding motifs in static offline databases.…”

Section: Related Workmentioning

confidence: 99%

Online Discovery of Top-k Similar Motifs in Time Series Data

Hoang

Ninh

Calders

2011

Proceedings of the 2011 SIAM International Conference on Data Mining

View full text Add to dashboard Cite

A motif is a pair of non-overlapping sequences with very similar shapes in a time series. We study the online topk most similar motif discovery problem. A special case of this problem corresponding to k = 1 was investigated in the literature by Mueen and Keogh [2]. We generalize the problem to any k and propose space-efficient algorithms for solving it. We show that our algorithms are optimal in term of space. In the particular case when k = 1, our algorithms achieve better performance both in terms of space and time consumption than the algorithm of Mueen and Keogh. We demonstrate our results by both theoretical analysis and extensive experiments with both synthetic and real-life data. We also show possible application of the top-k similar motifs discovery problem.

show abstract

“…Using these bounds, a superset of the k-NN answers can be returned, which will be then verified using the uncompressed sequences that will need to be fetched and compared with the query, so that the exact distances can be computed. Such filtering ideas are used in the majority of the data-mining literature for speeding up search operations [6,7,17].…”

Section: Searching Data Using Distance Estimatesmentioning

confidence: 99%

Optimal Distance Estimation Between Compressed Data Series

Freris¹,

Vlachos²,

Kozat

2012

Proceedings of the 2012 SIAM International Conference on Data Mining

View full text Add to dashboard Cite

Most real-world data contain repeated or periodic patterns. This suggests that they can be effectively represented and compressed using only a few coefficients of an appropriate complete orthogonal basis (e.g., Fourier, Wavelets, Karhunen-Loève expansion or Principal Components).In the face of ever increasing data repositories and given that most mining operations are distance-based, it is vital to perform accurate distance estimation directly on the compressed data. However, distance estimation when the data are represented using different sets of coefficients is still a largely unexplored area. This work studies the optimization problems related to obtaining the tightest lower/upper bound on the distance based on the available information. In particular, we consider the problem where a distinct set of coefficients is maintained for each sequence, and the L2-norm of the compression error is recorded. We establish the properties of optimal solutions, and leverage the theoretical analysis to develop a fast algorithm to obtain an exact solution to the problem. The suggested solution provides the tightest provable estimation of the L2-norm or the correlation, and executes at least two order of magnitudes faster than a numerical solution based on convex optimization. The contributions of this work extend beyond the purview of periodic data, as our methods are applicable to any sequential or high-dimensional data as well as to any orthogonal data transformation used for the underlying data compression scheme.

show abstract

Finding Time Series Motifs in Disk-Resident Data

Cited by 38 publications

References 26 publications

Accelerating Dynamic Time Warping Subsequence Search with GPUs and FPGAs

Accelerating Dynamic Time Warping Subsequence Search with GPUs and FPGAs

Online Discovery of Top-k Similar Motifs in Time Series Data

Optimal Distance Estimation Between Compressed Data Series

Contact Info

Product

Resources

About