## Monday, 17 February 2014

### Multivariate Wiener Filter Derivation

The Wiener filter is the optimal filter to use if you want to remove Gaussian noise from a Gaussian signal. Most useful applications though involve multiple (possibly correlated) signals with additive noise, so a multivariate Wiener filter is required. In this post I'll derive the multivariate Wiener filter as a Maximum A-Posteriori (MAP) estimator of the clean signal given the noisy signal. Because Gaussians are symmetric, the MAP estimator will be identical to the MMSE estimator, with the advantage that the MAP estimator is much easier to compute.

So our assumption is we have a noisy signal $$y(n)$$ which is the result of adding gaussian noise $$d(n)$$ to our clean signal $$x(n)$$. Remember each of these are vectors, so $$y(n)$$, $$d(n)$$ and $$x(n)$$ are vectors with $$k$$ dimensions.

$y(n) = x(n) + d(n)$

What we want to do is find the most likely value of $$x(n)$$ given $$y(n)$$. We never get to know $$d(n)$$, but we do have an estimate of the covariance of $$d(n)$$ which we will call $$\Sigma_d$$. We assumed initially that $$x(n)$$ and $$d(n)$$ were distributed according to the multivariate Gaussian distribution, but because the pdf takes up so much space, I'll be using the following notation:

$a \sim \mathcal{N}(\boldsymbol\mu_a,\boldsymbol\Sigma_a) = (2\pi)^{-\frac{k}{2}}|\boldsymbol\Sigma_a|^{-\frac{1}{2}}\, e^{ -\frac{1}{2}(\mathbf{a}-\boldsymbol\mu_a)'\boldsymbol\Sigma_a^{-1}(\mathbf{a}-\boldsymbol\mu_a) }$

This means $$a$$ is distributed according to a multivariate Gaussian with mean $$\boldsymbol\mu_a$$ and covariance $$\boldsymbol\Sigma_a$$. So we have $$d(n) \sim \mathcal{N}(0,\boldsymbol\Sigma_d)$$ and $$x(n) \sim \mathcal{N}(\boldsymbol\mu_x,\boldsymbol\Sigma_x)$$.

From our intial assumtion that $$y(n) = x(n) + d(n)$$, we can also determine the pdf of $$y(n)$$ if we know $$x(n)$$. This is simply $$p(y | x) = \mathcal{N}(x,\boldsymbol\Sigma_d)$$, i.e. we know $$y$$ will be centered around $$x$$ with a variance due only to the noise. We now have the distribution of the noisy given the clean, unfortunately we won't know the clean signal. What we want is the distribution of the clean signal given the noisy signal. From here we can use Bayes Rule:

$p(x | y) = \dfrac{p(y|x)p(x)}{p(y)}$

This will give us the pdf of the clean signal given the noisy, then we just have to find the maximum of the distribution to find our estimate of $$x$$. Note that $$p(y)$$ in the above equation does not effect the maximisation (because it does not depend on $$x$$), so it will be left out. From here we write out everything we have, using the values for $$p(x)$$ and $$p(y|x)$$ we worked out above, hopefully fitting everything on the page:

$p(x | y) = p(y|x)p(x)$ $=(2\pi)^{-\frac{k}{2}}|\boldsymbol\Sigma_d|^{-\frac{1}{2}}\, e^{ -\frac{1}{2}(y - x)'\boldsymbol\Sigma_d^{-1}(y-x)} \times (2\pi)^{-\frac{k}{2}}|\boldsymbol\Sigma_x|^{-\frac{1}{2}}\, e^{ -\frac{1}{2}(x-\boldsymbol\mu_x)'\boldsymbol\Sigma_x^{-1}(x-\boldsymbol\mu_x)}$

We want to maximise the equation above, unfortunately it is rather complicated. To get around this we take the log of the whole thing, since maximising a function and maximising its log will give the same point. The result is much easier to differentiate:

$\log(p(x | y)) = \log(C) - \frac{1}{2}(y - x)'\boldsymbol\Sigma_d^{-1}(y-x)- \frac{1}{2}(x-\boldsymbol\mu_x)'\boldsymbol\Sigma_x^{-1}(x-\boldsymbol\mu_x)$

where $$C$$ consists of all the bits that don't depend on $$x$$.

$C = (2\pi)^{-\frac{k}{2}}|\boldsymbol\Sigma_d|^{-\frac{1}{2}} (2\pi)^{-\frac{k}{2}}|\boldsymbol\Sigma_x|^{-\frac{1}{2}}$

Now, to maximise $$\log(p(x|y))$$, we differentiate it. This involves some vector calculus, the rule we use can be seen in The Matrix Cookbook (page 11, eqn 85 and 86).

$\dfrac{d\log(p(x | y))}{dx} = -\boldsymbol\Sigma_d^{-1}(y-x) + \boldsymbol\Sigma_x^{-1}(x-\boldsymbol\mu_x)$

To find the maximum we set the left hand side to zero, then solve for $$x$$:

$0 = -\boldsymbol\Sigma_d^{-1}(y-x) + \boldsymbol\Sigma_x^{-1}(x-\boldsymbol\mu_x)$

With a tiny bit of algebra, we can manipulate the above equation to isolate $$x$$,

$\hat{x'} = \boldsymbol\Sigma_x(\boldsymbol\Sigma_x + \boldsymbol\Sigma_d)^{-1}(y - \boldsymbol\mu_x) + \boldsymbol\mu_x$

and that is it. To get an estimate for the clean signal, we just use the equation above which takes the noisy signal. This is optimal (i.e. the best that is theoretically possible) if $$x$$ and $$d$$ are both Gaussian. It can still be O.K. if they are not, but you can easily repeat the above steps for different distributions. If the distributions aren't symmetric, then the MMSE estimator will usually perform better than the MAP estimator.

## Sunday, 9 February 2014

### LPCs using the Burg method in Julia

This post will discuss the Burg method of computing LPCs. Other methods include the Autocorrelation method, the Covariance method and the Modified Covariance method.

The burg method builds on the Modified Covariance (MCOV) method. The prediction error is defined in exactly the same way, however it may be that the MCOV method produces an unstable filter because it uses unconstrained optimisation to obtain the LPC coefficients. The burg method minimises the prediction error under the Levinson constraint:

$a_{p,k} = a_{p-1,k} + a_{p,p}a_{p-1,p-k}$

Lets define forward and backward prediction error as:

$f_p = \sum_{k=0}^p a_{p,k}x(n+p-k)$

and

$b_p = \sum_{k=0}^p a_{p,k}x(n+k)$

Minimisation of the prediction error power with respect to $$a_{p,p}$$ leads to

$a_{p,p} = -\dfrac{2 \sum_{n=0}^{N-p-1} b_{p-1}(n) f_{p-1}(n+1)}{\sum_{n=0}^{N-p-1} \left(b_{p-1}^2(n) + f_{p-1}^2(n+1)\right)}$

This leads to a recursive equation described below.

Initialise:

$P_0 = \dfrac{1}{N}\sum_{n=0}^{N-1}x^2(n)$ $f_0(n) = b_0(n) = x(n), \qquad n = 0,1,\ldots,N-1$

For $$p = 1,2,\ldots,L$$:

$a_{p,p} = -\dfrac{2 \sum_{n=0}^{N-p-1} b_{p-1}(n) f_{p-1}(n+1)}{\sum_{n=0}^{N-p-1} \left(b_{p-1}^2(n) + f_{p-1}^2(n+1)\right)}$ $a_{p,k} = a_{p-1,k} + a_{p,p}a_{p-1,p-k}, \qquad k=1,2,\ldots,p-1$ $P_p = P_{p-1}(1-a_{p,p}^2)$ $f_p(n) = f_{p-1}(n+1) + a_{p,p}b_{p-1}(n)$ $b_p(n) = b_{p-1}(n) + a_{p,p}f_{p-1}(n+1)$

Where $$G_L^2 = P_L$$. A julia implementation of this algorithm:

## Wednesday, 5 February 2014

### Log Area Ratios and Inverse Sine coefficients

Linear prediction coefficients are used for speech coding to compactly characterise the short-time power spectrum. Unfortunately, LPCs are not very suitable for quantisation due to their large dynamic range, also small quantisation errors in individual LPC coefficients can produce relatively large spectral errors and can also result in instability of the filter. To get around this, it is necessary to transform the LPCs into other representations which ensure the stability of the LPC filter after quantisation. The available representations include: reflection coefficients (RCs), log area ratios (LARs), inverse sine coefficients (ISs) (or arcsine reflection coefficients), and line spectral frequencies (LSFs).

Reflection coefficients are computed from LPCs using the levinson durbin recursion, and have 2 main advantages over LPCs: 1) they are less sensitive to quantisation than LPCs, and 2) the stability of the resulting filter can easily be ensured by making all reflections coefficients stay between -1 and 1.

Though RCs are less sensitive to quantisation than LPCs, they have the drawback that their spectral sensitivity is U-shaped, having large values whenever the magnitude of these coefficients is close to unity. This means the coefficients are very sensitive to quantisation distortion when they represent narrow bandwidth poles. However, this can be overcome by using a non-linear transformation of the RCs. Two such transformations are LARs and ISs.

More information on this topic can be sourced from Speech Coding and Synthesis.

## Log Area Ratios

Log area ratios (LAR) are computed from reflection coefficients for transmission over a channel. They are not as good as line spectral pairs (LSFs), but they are much simpler to compute. If $$k$$ is a a reflection coefficent, LARs can be computed using the following formula:

$\text{LAR} = \log \left( \dfrac{1+k}{1-k} \right)$

The inverse operation, computing reflection coefficents from the LARs is:

$k = \dfrac{e^{\text{LAR}}-1}{e^{\text{LAR}}+1}$

As an example, if you have the following reflection coefficents: -0.9870 0.9124 -0.2188 0.0547 -0.0383 -0.0366 -0.0638 -0.0876 -0.1178 -0.1541, then the corresponding LARs are -5.0304 3.0836 -0.4447 0.1095 -0.0766 -0.0732 -0.1279 -0.1757 -0.2367 -0.3106.

## Inverse Sine Coefficients

IS parameters are another transformation of reflection coefficients, their formula is as follows:

$\text{IS} = \dfrac{2}{\pi} \sin^{-1} (k)$

The inverse operation, computing reflection coefficents from the ISs is:

$k = \sin \left( \dfrac{\pi \text{IS}}{2} \right)$

Using the same reflection coefficients as previously, the IS parameters are -0.8973 0.7316 -0.1404 0.0348 -0.0244 -0.0233 -0.0407 -0.0559 -0.0752 -0.0985

## Tuesday, 4 February 2014

### LPCs using the Modified Covariance Method in Julia

This post will discuss the modified covariance method (a.k.a the forward-backward method) of computing LPCs. Other methods include the Autocorrelation method, the Covariance method and the Burg method.

Linear prediction coefficients (LPCs) are used a lot in signal processing, and the theory around it is covered in other places (see here for an intuitive explanation). In this post I will go over how to implement LPCs using the modified covariance method in Julia. Given a signal $$x$$, the job of LPCs is to predict the current sample i.e. $$x(n)$$ using a weighted sum of previous samples:

$x(n) = \sum_{i=1}^L a(i) x(n-i) + e(n)$

Ideally, $$e(n)$$, the error, will be zero but usually it is not. As a result we want values of $$a(i)$$ that make $$e(n)$$ as small as possible. The autocorrelation method resulted in the Yule-Walker equations, which have a nice structure (the resulting matrix is toeplitz). The modified covariance method results in a similar set of equations, but the resulting matrix is not toeplitz, so we can't use the Levinson-Durbin algorithm to solve it. This means we have to rely on good old matrix inversion. The covariance equations we have to solve look like this:

$\sum_{i=1}^L \phi_{i,k} a(i) = -\phi_{k,0} \qquad 1 \leq k \leq L$

where the $$\phi$$s are defined as:

$\phi_{i,k} = \dfrac{1}{2(N-L)} \sum_{n=0}^{N-L-1} x(n+k)x(n+i) + x(n+L-k)x(n+L-i)$

where N is the length of the signal we are working with, and L is the order of the model we want. The matrix equation that results looks like this:

$\left[\begin{array}{cccc} \phi_{1,1} & \phi_{2,1} & \ldots & \phi_{L,1} \\ \phi_{1,2} & \phi_{2,2} & \ldots & \phi_{L,2} \\ \vdots & \vdots & \ddots & \vdots\\ \phi_{1,L} & \phi_{2,L} & \ldots & \phi_{L,L} \end{array}\right] \left[\begin{array}{c} a(1) \\ a(2) \\ \vdots \\ a(L) \end{array}\right] = -\left[\begin{array}{c} \phi_{1,0} \\ \phi_{2,0} \\ \vdots \\ \phi_{L,0} \end{array}\right]$

This can also be written as:

$Pa = -p$

Note that the matrix $$P$$ is symmetric but not toeplitz. To solve it we use:

$a = P^{-1}({-p})$

Once we have the values of $$a$$ we are done. Julia code for this procedure follows.

### LPCs using the Covariance Method in Julia

This post will discuss the covariance method of computing LPCs. Other methods include the Autocorrelation method, the modified covariance (Forward-Backward) method and the Burg method.

Linear prediction coefficients (LPCs) are used a lot in signal processing, and the theory around it is covered in other places (see here for an intuitive explanation). In this post I will go over how to implement LPCs using the covariance method in Julia. Given a signal $$x$$, the job of LPCs is to predict the current sample i.e. $$x(n)$$ using a weighted sum of previous samples:

$x(n) = \sum_{i=1}^L a(i) x(n-i) + e(n)$

Ideally, $$e(n)$$, the error, will be zero but usually it is not. As a result we want values of $$a(i)$$ that make $$e(n)$$ as small as possible. The autocorrelation method resulted in the Yule-Walker equations, which have a nice structure (the resulting matrix is toeplitz). The covariance method results in a similar set of equations, but the resulting matrix is not toeplitz, so we can't use the Levinson-Durbin algorithm to solve it. This means we have to rely on good old matrix inversion. The covariance equations we have to solve look like this:

$\sum_{i=1}^L \phi_{i,k} a(i) = -\phi_{k,0} \qquad 1 \leq k \leq L$

where the $$\phi$$s are defined as:

$\phi_{i,k} = \dfrac{1}{N-L} \sum_{n=L}^{N-1} x(n-k)x(n-i)$

where N is the length of the signal we are working with, and L is the order of the model we want. The matrix equation that results looks like this:

$\left[\begin{array}{cccc} \phi_{1,1} & \phi_{2,1} & \ldots & \phi_{L,1} \\ \phi_{1,2} & \phi_{2,2} & \ldots & \phi_{L,2} \\ \vdots & \vdots & \ddots & \vdots\\ \phi_{1,L} & \phi_{2,L} & \ldots & \phi_{L,L} \end{array}\right] \left[\begin{array}{c} a(1) \\ a(2) \\ \vdots \\ a(L) \end{array}\right] = -\left[\begin{array}{c} \phi_{1,0} \\ \phi_{2,0} \\ \vdots \\ \phi_{L,0} \end{array}\right]$

This can also be written as:

$Pa = -p$

Note that the matrix $$P$$ is symmetric but not toeplitz. To solve it we use:

$a = P^{-1}({-p})$

Once we have the values of $$a$$ we are done. Julia code for this procedure follows.

## Saturday, 1 February 2014

### Implementing Linear Prediction Coefficients in Julia

This post will discuss the autocorrelation or Yule-Walker method of computing LPCs. Other methods include the Covariance method, the modified Covariance Method (Forward-Backward method) and the Burg method.

Linear prediction coefficients (LPCs) are used a lot in signal processing, and the theory around it is covered in other places (see here for an intuitive explanation). In this post I will go over how to implement LPCs in Julia. Given a signal $$x$$, the job of LPCs is to predict the current sample i.e. $$x(n)$$ using a weighted sum of previous samples:

$x(n) = \sum_{i=1}^L a(i) x(n-i) + e(n)$

Ideally, $$e(n)$$, the error, will be zero but usually it is not. As a result we want values of $$a(i)$$ that make $$e(n)$$ as small as possible. Formally, we try to minimise the prediction error power:

$P = \dfrac{1}{N}\sum_{n=-\infty}^{\infty} e(n)^2$ $= \dfrac{1}{N}\sum_{n=-\infty}^{\infty} \left[ x(n) + \sum_{k=1}^{L} a(k) x(n-k) \right]$

What we want to do is minimise $$P$$ with respect to $$a(i)$$. When this is done, we get a set of equations that must be satisfied for $$P$$ to be minimised, the well known Yule-Walker equations:

$\sum_{i=1}^L R_{xx}(i-k) a(i) = R_{xx}(k) \qquad 1 \leq k \leq L$

where $$R_{xx}$$ is the autocorrelation sequence of our signal $$x$$. We can write out the Yule-Walker equations as a matrix equation, so solving for the LPCs (values of $$a(i)$$) becomes easy:

$\left[\begin{array}{cccc} R(0) & R(1) & \ldots & R(L-1) \\ R(1) & R(0) & \ldots & R(L-2) \\ \vdots & \vdots & \ddots & \vdots\\ R(L-1) & R(L-2) & \ldots & R(0) \end{array}\right] \left[\begin{array}{c} a(1) \\ a(2) \\ \vdots \\ a(L) \end{array}\right] = \left[\begin{array}{c} R(1) \\ R(2) \\ \vdots \\ R(L) \end{array}\right]$

where $$R(0)$$ is the same $$R_{xx}(0)$$ in the previous equations, and $$a(i)$$ are the LPCs we want to compute. The naive way of solving the Yule Walker equations is to invert the square matrix of autocorrelation coefficients, however this has $$O(L^3)$$ complexity. There is faster way that uses the special structure of the $$R$$ matrix (it has toeplitz structure), the algorithm for solving it is called the Levinson-Durbin algorithm which can find the values of $$a(i)$$ with $$O(L^2)$$ complexity.

The Levinson-Durbin algorithm looks a little like the following:

For $$m = 1$$:

$a_{1,1} = -\dfrac{R(1)}{R(0)}$ $P_1 = R(0)(1-a_{1,1}^2)$

For $$m = 2,3,\ldots,L$$:

$a_{m,m} = -\dfrac{R(m) + \sum_{i=1}^{m-1} a_{m-1,i}R(m-i)}{P_{m-1}}$ $a_{m,i} = a_{m-1,i} + a_{m,m}a_{m-1,m-i} \qquad i=1,2,\ldots,m-1$ $P_m = P_{m-1}(1-a_{m,m}^2)$

Our final LPCs are a(1:L) = $$a_{L,1:L}$$, and the gain term is P = $$P_L$$ where $$L$$ is the number of LPCs we want. In Julia code, this will look like the following:

And that is it.