Time series motifs have been in the literature for about fifteen years, but have only recently begun to receive significant attention in the research community. This is perhaps due to the growing realization that they implicitly offer solutions to a host of time series problems, including rule discovery, anomaly detection, density estimation, semantic segmentation, etc. Recent work has improved the scalability to the point where exact motifs can be computed on datasets with up to a million data points in tenable time. However, in some domains, for example seismology, there is an insatiable need to address even larger datasets. In this work we show that a combination of a novel algorithm and a high-performance GPU allows us to significantly improve the scalability of motif discovery. We demonstrate the scalability of our ideas by finding the full set of exact motifs on a dataset with one hundred million subsequences, by far the largest dataset ever mined for time series motifs. Furthermore, we demonstrate that our algorithm can produce actionable insights in seismology and other domains.
We present a new method to accelerate the process of matched filtering (template matching) of seismic waveforms by efficient calculation of (cross-) correlation coefficients. The crosscorrelation method is commonly used to analyze seismic data, for example, to detect repeating or similar seismic waveform signals, earthquake swarms, foreshocks, aftershocks, lowfrequency earthquakes (LFEs), and nonvolcanic tremor. Recent growth in the density and coverage of seismic instrumentation demands fast and accurate methods to analyze the corresponding large volumes of data generated. Historically, there are two approaches used to perform matched filtering; one using the time domain and the other the frequency domain. Recent studies reveal that time domain matched filtering is memory efficient and frequency domain matched filtering is time efficient, assuming the same amount of computational resources. We show that our super-efficient cross-correlation (SEC-C) method-a frequency domain method that optimizes computations using the overlap-add method, vectorization, and fast normalization-is not only more time efficient than existing frequency domain methods when run on the same number of central processing unit (CPU) threads but also more memory efficient than time domain methods in our test cases. For example, using 30 channels of data with a sample rate of 50 Hz and 30 templates, each with durations of 8 s, SEC-C uses only 2.3 GB of memory whereas other frequency domain codes use three times more and parallelized time-domain codes use ∼30% more. We have implemented a precise, fully normalized version of SEC-C that removes the mean of the data in each sliding window, and thus can be applied to raw seismic data. Another strength of the SEC-C method is that it can be used to search for repeating seismic events in a concatenated stack of individual event waveforms. In this use case, our method is more than one order of magnitude faster than conventional methods. The SEC-C method does not require specialized hardware to achieve its computation speed; instead it exploits algorithmic ideas that are both time-and memory-efficient and are thus suitable for use on off-the-shelf desktop machines.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.