How does dynamic time warping work?

Dynamic time warping (DTW) checks and compares sequences with different lengths for similarities. The dynamic time warping algorithm generates warping paths. Warping paths recognise matches using backtracking and degree of difference. Time distortion or different speeds do not influence their ability to recognise similarities. DTW is mainly used for speech recognition, digital signature recognition, financial market analysis.

What does dynamic time warping mean?

Dynamic time warping may sound like a technology from Star Trek. However, it is just an algorithm for comparing time sequences. The algorithm maps them on top of each other and compares them. ‘Warping’ the sequences allows similarities and matching patterns between the sequences to be recognised, even when they have different lengths and speeds.

As an example, if two walking sequences were examined for similarities, the algorithm would be able to recognise identical patterns if the walking speed or distance was different. The same goes for comparing signatures of the same person, similarities can be identified even if writing size differs.

What is the purpose of dynamic time warping?

DTW may seem like an abstract concept without any practical benefits. However, dynamic time warping actually performs an important analysis function in a wide variety of industries. It is most beneficial for sequences that can be mapped and compared linearly, such as video and audio sequences, graphics or statistical models. When DTW detects matches, it can trigger certain actions, signals pr functions.

Pattern measurement and pattern recognition in sequences can be used to study similar developments over different time periods. Dynamic time warping is often used in forward-looking technologies such as machine learning, supervised learning and neural networks to train the analysis and response capabilities of self-learning systems and evaluate data sets more efficiently.

How does dynamic time warping work?

DTW searches for optimal matches to identify patterns and similarities in different sequences. The principles ‘one-to-many’ or ‘many-to-one’ are helpful in this case. Different rules and conditions are applied:

  • Each value from the first sequence must be compared with one or more values from the second sequence (and vice versa).
  • The first value from a sequence must be compared with the first value from the second sequence.
  • The last value from a sequence must be compared with the last value from the second sequence.
  • Mapping values from the first sequence to the values from the second sequence must increase monotonically. Values at the beginning and end of the sequences must have matched positions, without omission or overlap.

An optimal match exists when all specifications and conditions are fulfilled. The cost function measures the degree of difference between the sequences’ values. An optimal match requires the cost function to be as low as possible.

The algorithm also creates a warping path that can detect optimal matches in sequences with different lengths. The warping path is created using backtracking. The algorithm maps one or more values from one sequence to points from the second sequence. This can find matches in time sequences with different lengths.

Where is dynamic time warping used?

Dynamic time warping can be used wherever data sets can be mapped and compared in linear sequences. For example, DTW is often used in pre-examination and post-examination in data analysis. This may be audio data, video data, alphanumeric data or Big Data-based data sets.

DTW is also used in:

  • Pattern recognition in audio sequences: DTW is plays an important role in speech recognition. Recorded and stored speech patterns are mapped on top of each other to compare patterns using DTW. DTW uses adaptive time normalisation to create a warping path for audio sequences with different lengths and speeds. This can even detect matches in vowels and consonants with different lengths and speeds.
  • Pattern recognition video sequences: DTW can map video sequences on top of each other for gesture and motion recognition. A comparison and pattern recognition can be performed in motion sequences with different temporal lengths or different speeds.
  • Pattern recognition in financial data: DTW is also used in financial markets and corporate finance. DTW is the best option when creating forecasts of financial cycles, sales curves or stock market trends. DTW can be used to show similar or identical cycles and trends in market and company data over different time periods. The analysis is based on forecasts as well as on business and financial data.

Which tools use dynamic time warping?

The DTW algorithm is found in various open source software. These include:

  • DTW Suite with programming packages for Python and R.
  • FastDTW provides a Java implementation for DTW algorithms.
  • MatchBox implements DTW for audio signals.
  • mlpy and pydtw also provide DTW functions as Python libraries for machine learning.
  • Gesture Recognition provides DTW algorithms for real-time gesture recognition.

The following fast computation techniques use DTW as efficiently as possible:

  • PruneDTW
  • SparseDTW
  • FastDTW
  • MultiscaleDTW

Dynamic time warping in Python

Below is an example of a DTW algorithm in Python code which illustrates its complexity:

def dtw(s, t, window):
    n, m = len(s), len(t)
    w = np.max([window, abs(n-m)])
    dtw_matrix = np.zeros((n+1, m+1))
    
    for i in range(n+1):
        for j in range(m+1):
            dtw_matrix[i, j] = np.inf
    dtw_matrix[0, 0] = 0
    
    for i in range(1, n+1):
        for j in range(np.max([1, i-w]), np.min([m, i+w])+1):
            dtw_matrix[i, j] = 0
    
    for i in range(1, n+1):
        for j in range(np.max([1, i-w]), np.min([m, i+w])+1):
            cost = abs(s[i-1] - t[j-1])
            # take last min from a square box
            last_min = np.min([dtw_matrix[i-1, j], dtw_matrix[i, j-1], dtw_matrix[i-1, j-1]])
            dtw_matrix[i, j] = cost + last_min
    return dtw_matrix
Python

The function presented in the example is passed three parameters. Firstly, the two signals that are being examined are passed in the parameters named s and t. The signals are usually arrays or vectors. The parameter called window can be used to specify how many other elements can be matched.

A matrix is created in the function and then initialised with an infinite value. The main DTW step takes place in the final two nested ‘For’ loops. The value stored in the variable last_min is added to the previous costs that are stored in the variable costs. These result from the distance between the two input values at the respective index.

This value results from previously calculated values in dynamic programming. You can select the minimum from the previously calculated surrounding values and add it to the previously calculated costs. This step determines whether two elements are a direct match, and then adds an element or deletes an element.

After the function has been carried out, you will receive a distance matrix with the warping path.

Dynamic time warping alternatives

Alternatives to DTW pattern recognition in sequences and timings are:

  • Correlation Optimized Warping (COW)
  • Functional data analysis
  • Hidden Markov Model
  • Viterbi algorithm
Was this article helpful?
Page top