Abstract-Many time series data mining problems require subsequence similarity search as a subroutine. While this can be performed with any distance measure, and dozens of distance measures have been proposed in the last decade, there is increasing evidence that Dynamic Time Warping (DTW) is the best measure across a wide range of domains. Given DTW's usefulness and ubiquity, there has been a large community-wide effort to mitigate its relative lethargy. Proposed speedup techniques include early abandoning strategies, lower-bound based pruning, indexing and embedding. In this work we argue that we are now close to exhausting all possible speedup from software, and that we must turn to hardware-based solutions if we are to tackle the many problems that are currently untenable even with stateof-the-art algorithms running on high-end desktops. With this motivation, we investigate both GPU (Graphics Processing Unit) and FPGA (Field Programmable Gate Array) based acceleration of subsequence similarity search under the DTW measure. As we shall show, our novel algorithms allow GPUs, which are typically bundled with standard desktops, to achieve two orders of magnitude speedup. For problem domains which require even greater scale up, we show that FPGAs costing just a few thousand dollars can be used to produce four orders of magnitude speedup. We conduct detailed case studies on the classification of astronomical observations and similarity search in commercial agriculture, and demonstrate that our ideas allow us to tackle problems that would be simply untenable otherwise.