Mining Time-series with Trillions of Points: Dynamic Time Warping at scale

Take a similarity measure that’s already well-known to researchers who work with time-series, and devise an algorithm to compute it efficiently at scale. Suddenly intractable problems become tractable, and Big Data mining applications that use the metric are within reach.

The classification, clustering, and searching through time series have important applications in many domains. In medicine EEG and ECG readings translate to time-series data collections with billions (even trillions) of points. In fact many research hospitals have trillions of points of EEG data. Other domains where large time series data collections are routine include gesture recognition & user interface design, astronomy, robotics, geology, wildlife monitoring, security, and biometrics.

The problem is that existing algorithms don’t scale1 to sequences with hundreds of billions or trillions of points. Consider a doctor who needs to search through EEG data (with hundreds of billions of points), for a “prototypical epileptic spike”, where the input query is a time-series snippet with thousands of points.

Recently a team of researchers led by Eamonn Keogh of UC Riverside introduced a set of tools for mining time-series with trillions of points. Their approach allows for ultrafast subsequence search under both Dynamic Time Warping and Euclidean Distance. To put their results in perspective, a time series with one trillion points is “… more than all of the time series data considered in all papers ever published in all data mining conferences combined”.

What is Dynamic Time Warping?
The Euclidean Distance (ED) between two time series {xi} and {yi} is

SQRT[ Σ (xi – yi)2 ]

While ED is easy to define, it performs poorly as a similarity score. Since it aligns the ith point on one time series with the ith point on another, ED is very sensitive to distortions in the time domain. In 1994, Dynamic Time Warping (DTW) was introduced to adjust for this: it allows for elastic shifting in the time domain and matches sequences that are similar but out of phase.

Dynamic Time Warping
Inspired by ideas from speech recognition,
Dynamic time warping is an algorithm for measuring the similarity of two time series.

There are an exponential number of paths (from one time series to the other) through the warping matrix. DTW is computed by identifying an optimal alignment path between two time series sequences. (The optimal path is found using dynamic programming subject to a few constraints.) DTW is the (weighted) sum of all the cells that were visited along the optimal path.

Since its introduction DTW has been enthusiastically applied by researchers from many domains. But it has always been deemed to be too computationally expensive to apply to massive databases.

What’s exciting about the UCR Dynamic Time Warping algorithm (UCR-DTW)
First, UCR-DTW is faster than all current search algorithms – it’s even faster than current Euclidean distance searches. An example cited in a recent paper is a search through a ECG times series with 8.5 billion points. The query was for a sequence comprised of 421 points (“idealized Premature Ventricular Contraction”). The results below are from a multi-core desktop:

Here is a video the UCR team put together of a similar ECG search, conducted on a smaller data set (20.1 million points):

The previous result is representative of what the UCR team found. Here is another set of results based on randomly generated data (the query length is a sequence of length |Q|, and SOTA refers to State-of-the-art):

As a distance measure, DTW is hard to beat. Many other measures have been proposed, but based on empirical evidence2, none are better than DTW. Thanks to the work of the UCR team, DTW is now an option even for massive data sets! In a recent paper, the team behind UCR-DTW cite a few potential applications of their speedier and more efficient algorithm. One in particular might be of interest to researchers interested in ubiquitous computing and the quantified self:

The proliferation of inexpensive low-powered sensors has produced an explosion of interest in monitoring real time streams of medical telemetry and/or Body Area Network (BAN) data. There are dozens of research efforts in this domain that explicitly state that while monitoring under DTW is desirable, it is impossible. Thus, approximations of, or alternatives to DTW are used. Dozens of suggested workarounds have been suggested.
… We believe that the UCR suite makes all of these objections moot. DTW can be used to spot gestures/brainwaves/musical patterns/anomalous heartbeats in real-time, even on low-powered devices, even with multiple channels of data, and even with multiple simultaneous queries.

Why not just index long time series? If the query sequence was of fixed length (specified in advance), then indexing might help3. But if the query sequence was of arbitrary length, UCR-DTW is the only viable algorithm.

Other data mining applications of UCR-DTW
Besides search, the team behind UCR-DTW plans to apply their new algorithm to several (time series) data mining problems including:

1. Time Series motif discovery: motifs are similar subsequences of a long time series
2. Extracting time series shapelets: shapelets are time series primitives that can be used to speed up automatic classification (by reducing the number of “features”).
3. Classification and clustering of time series

Further Reading:

1. Using dynamic time warping to find patterns in time series => the paper that introduced dynamic time warping
2. The web site of the UCR-DTW project
3. Paper that introduced UCR-DTW: Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping
4. PowerPoint presentation that gives a nice visual introduction to how DTW is calculated: Dynamic Time Warping Algorithm for Gene Expression Time Series

Related content:

1. Compressed Sensing and Big Data
2. Data Mining with the Maximal Information Coefficient

(1) “Scale and latency” are relative. But generally researchers who’ve spent days/weeks/months collecting data, tend to be fine with an an algorithm that may take a few hours to detect patterns within their data.
(2) In a recent paper, the UCR team noted that “… after an exhaustive literature search of more than 800 papers, we are not aware of any distance measure that has been shown to outperform DTW by a statistically significant amount on reproducible experiments”.
(3) Indexing trillion length objects would still be challenging!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s