A Complete Introduction To Time Series Analysis (with R):: Durbin-Levinson Algorithm

Image for post
Image for post
The equations of the Durbin-Levinson Algorithm

In the last article, we saw how we could find the form of the best linear predictor of X_{n+h} using all previous observations up to n has the form

Image for post
Image for post

where the coefficients satisfy

Image for post
Image for post
The coefficients of the best linear predictor satisfy this equation.

We also saw that in practice it is very cumbersome to invert the Gamma matrix on the left. Therefore, we would like to have a way to find the coefficients without actually having to inverse the matrix by hand.

Dubin-Levinson Algorithm

Let {X_{t}} be a stationary time series and zero-mean , then for n=0,1,.. , the Dubin-Levinson algorithm computes coefficients

Image for post
Image for post
Image for post
Image for post

Let

Image for post
Image for post

Then, the Dubinson algorithm is given by

Image for post
Image for post

for n=2,… Although it might seem a bit complex, it turns out that by executing this algorithm, we recursively obtain the coefficients of the BLP at each step, for as many steps as we need. These coefficients are produced along the process! Further, notice that althugh we can only predict one step ahead at the time,

Image for post
Image for post

we can actually use these predictions to compute further observations, recursively. Pretty cool eh! Since we don’t need to invert any matrix, it turns out that this algorithm runs in O(n²) time, one polynomial level below the naive matrix inversion! Further, we also obtain the MSE of the prediction at each time step, as

Image for post
Image for post

Proof of correctness

This part is actually quite tedious, and not for the weak of the heart. I leave it here for the mathematically inclined, but else, feel free to skip to the “Partial autocorrelation function” section. First, let’s define

Image for post
Image for post

That is, the first one is just the Gamma matrix scaled by the variance (or zero-covariance). We also define to the right two vectors: one that is a collection of all correlations from 1 to n, and the other one which is the same vector, but backwards. Now note that

Image for post
Image for post

The first equation holds from what we have previously derived in the last article, that is

Image for post
Image for post

. Here, we simply give another name to the coefficients. Let’s now examine why this algorithm works, by using mathematical induction.

Image for post
Image for post
Image for post
Image for post

Indeed, this is the correct coefficient, as we had previously shown that

Image for post
Image for post

minimizes the MSE when using only the last observation.

Image for post
Image for post

Note that we essentially want to show that this linear combination correctly produces the series of partial autocorrelations , which are the coefficients of the best linear predictor!

Image for post
Image for post

Let’s see what’s going on here:

  1. First, we simply put the definition of the matrix, and realize that its elements are precisely given that way (remember that this is a symmetric matrix!), and also given to the recursive step. The rightmost vector is given by the definitions of the algorithm.
  2. We the proceed to perrom matrix multiplication like in the second step (you can verify this).
  3. In the third step, we distribute the terms, and realize that the first term in the equation can be replaced by definition , so that we can re-arange terms and factorize in both upper and lower entries.
  4. In the last part, we see that the second term of the first entry is zero, by pluggin in the definition by induction.

Although this might seem very confusing at first, I highly recommend you go though it by yourself to fully understand it. Now, after the last step, we know that

Image for post
Image for post

Now, one thing that we have to show is that

Image for post
Image for post

Which follows as we have previously shown the MSE of the BLP has the vectorial form above in the last article. In the last equation, we simply factorized the ACV(0). We can use these facts for the second entry of the last vector in the equation above to obtain

Image for post
Image for post
Image for post
Image for post

Thus, using these results we have shown that

Image for post
Image for post

Finally, we need to show that

Image for post
Image for post

We can do this as

Image for post
Image for post

For the first two lines, we are just using the facts that we found before, and then we are simply applying the recursive definitions of the algorithm. After some algebraic manipulations, we arrive at the desired answer.

Partial Autocorrelation Function

Image for post
Image for post
Image for post
Image for post
Image for post
Image for post

More explicitly,

Image for post
Image for post

Indeed, what this is telling us is, the autocorrelation given by the last timsetmp and the first one, ignoring the timestamps in between. The interesting thing is that thanks to the Duvin-Levinson algorithm, we already obtain these coefficients, which also happen to be the PACF! Remember that very early in this series, we actually presented the PACF briefly but didn’t talk much about how it came to be. Now you know how :). Make sure to take a look back at the Classical Decomposition and Differencing articles to see how to interpret the PACF!

Note: In practice, we do not know the actual ACV values of interest. However, we can simply use the sample ACV from the data as estimates!

How to R

We will present a very brief example of using the Durbin-Levinson algorithm. For this, we wil need the R package FitAR . First we construct an AR(4) model, that is, a model that has dependence on the 4 last observations, i.e.

Image for post
Image for post

Above, we first set a random seat, then set a vector of the true values of the AR(4) model. We then convert this into a list, and use the arima.sim function to simulate some data accordingly. Then, we define the sample autocorrelation function for the whole series using the built-in acv function finally the FitAR::PacfDL function to calculate the partial autocorrelation as follows:

We specify a lag of 4 because we know that our series comes from the AR(4) process. If we didn’t know, we would have to first fit a bunch of models to determine this (we will see how later). We can also compare this to the built-in function for autocorrelations:

Finally, we can visualize it with ggPacf :

Image for post
Image for post

Notice something interesting? It is not a coincidence that exactly four lags fall out of bounds, and they are proportional to the magnitude of the AR(4) model coefficients. We will see why this is the case in a later article.

Note: Example adapted from the FitAR package documentation.

Next time

And that’s it for today! As you could see, inverting the Gamma matrix can become rather bothersome. In the next article, we will see how we can instead use a clever algorithm that will save us from doing this, and still obtaining the desired coefficients. Stay tuned, and see you next time!

Last time

Best Linear Predictor (Part I)

Main page

Follow me at

  1. https://blog.jairparraml.com/
  2. https://www.linkedin.com/in/hair-parra-526ba19b/
  3. https://github.com/JairParra
  4. https://medium.com/@hair.parra

Data Scientist & Data Engineer at Cisco, Canada. McGill University CS, Stats & Linguistics graduate. Polyglot.

Get the Medium app