jimblog: 2013

Sunday 15 December 2013

Expected Value of a Rayleigh Distribution

Computing the Mean

Rayleigh distributions pop up here and there, but the procedure outlined here for computing the expected value (i.e. the mean) of a distribution can be applied to any probability distribution function, it is just solving an integral. Sometimes these integrals can be pretty tough though, so it is best to use e.g. mathematica to solve them for you.

The Rayleigh Distribution is defined as follows:

\[f(x|\sigma) = \dfrac{x}{\sigma^2} \exp \left( -\dfrac{x^2}{2\sigma^2} \right) \]

The expected value of a distribution \(f(x)\) is defined as follows:

\[ \mathbb{E}[x] = \int x f(x) dx \]

By substituting our desired pdf into the above equation, and solving the integral we get the mean of the distribution:

\[ \mathbb{E}[x] = \int_0^\infty x \dfrac{x}{\sigma^2} \exp \left( -\dfrac{x^2}{2\sigma^2}\right) dx \]

We can now use the Wolfram integral calculator to solve it for us:

\[ \mathbb{E}[x] = \sigma \sqrt{\dfrac{\pi}{2}} \]

We can compare this to the value given on the Rayleigh Distribution page, so we know we got the right answer. Remember, this can be used for any distribution, not just the Rayleigh distribution.

Computing the Variance

Once we have the mean, we can also compute the variance using the following formula:

\[Var(x) = \mathbb{E}[x^2] - \mathbb{E}[x]^2 \]

Substituting in the value of \(\mathbb{E}[x]\) that we have already computed, we have the following equation:

\[Var(x) = \int_0^\infty x^2 \dfrac{x}{\sigma^2} \exp \left( -\dfrac{x^2}{2\sigma^2}\right) dx - \left( \sigma \sqrt{\dfrac{\pi}{2}} \right)^2 \]

Which simplifies very nicely after solving the integral with wolfram:

\[Var(x) = \sigma^2(2-\pi/2)\]

Thursday 12 December 2013

Dictionaries in Julia

I always seem to forget how to use dictionaries in Julia, so I thought I'd cover some of the common use cases here.

Initialisation

Create an empty dictionary:

dict = Dict()
dict["a"] = 1
dict["b"] = 2

The above Dict is of type Dict{Any,Any}, so anything can be used for keys or values. To create an empty dictionary that explicitly maps strings to floats:

d = Dict{String,Float32}()
d["q"] = 3 # works
d[32] = 3 # fails, expected a string key

Create a pre-filled Dict (note the different brace styles):

# infers type information Dict{ASCIIString,Int64} 
# created in this case
dict = ["Jan"=> 1, "Feb"=> 2, "Mar"=> 3] 
# -or-
# creates dict of type Dict{Any,Any}
dict = {"Jan"=> 1, "Feb"=> 2, "Mar"=> 3}

Iterate over the keys of a Dict:

for key = keys(dict)
    println("$key: $(dict[key])")
end
# values(dict) works the same way to extract values

Getting values from a dictionary:

# raises an error if "Dec" is not present
temp = dict["Dec"]   
# returns a default value (0 in this case) 
# if "Dec" is not present.
temp = get(dict,"Dec",0)

Instead of the get() function above, haskey(dict,key) can also be used to check if a key is present. Full documentation is here.

Wednesday 11 December 2013

Breaking Substitution Ciphers Part 1

The 'Substitution cipher' is a fairly well known algorithm for enciphering text, a good description of which can be found here. It is also fairly well known that they can be fairly easily broken using hill climbing approaches.

In this post I'm going to look at a different method of breaking substitution ciphers which is not so dependent on trying lots of different cases until one works. The problem with that approach is that you can't be sure how long your cracker will need to run, perhaps by leaving it a bit longer it will eventually succeed? So I'll treat the sentence somewhat like a HMM, then try and find the most likely sequence of characters that make up the key. This will require relaxing some of the constraints, in this case we will relax the constraint that a ciphertext letter will always decrypt to the same plaintext letter. At the end we'll try and massage everything back into place so that we have the most likely decryption key.

Building the Lattice

Our first task is to build the lattice over which we will be doing all our work, our test sentence will be DEFEND THE EAST enciphered to GIUIFGCEIIPRC. This is a very short sentence which will probably not break due to its length, but hopefully it will illustrate the technique which can be applied to longer sentences.

Our first step is to determine emission probabilities. e.g. G is the first letter of the ciphertext, what is the probability that ciphertext G decrypts to plaintext A? or B? This question will be answered using the multinomial distribution and the counts for each letter. In this case G occurs twice in 13 letters, and we know that A occurs with a frequency of 0.0855. This means the probability of ciphertext G corresponding to plaintext A can be computed as follows:

\[ p(G_{CT} = A_{PT}) = \dfrac{n_{TOT}!}{n_A!~n_{\lnot A}!}p(A)^{n_A}p(\lnot A)^{n_{\lnot A}}\] \[= \dfrac{13!}{2!11!}0.0855^2 \times (1-0.0855)^{11} = 0.2133\]

In the equation above I have simplified the formula a bit by assuming \(x_1\) is the number of times A occurs, and \(x_2\) is the number of times all other characters occur. We then have to repeat this computation for all plaintext characters, for example the probability that G is B is 0.0167, so it is more likely that G decrypts to A than B.

By applying the formula above for every plaintext letter and each ciphertext letter we get a 'lattice' of size \(n\times 26\). It will look something like the following, note that each column has been normalised to sum to 1:

	G	I	U	I	F	G	C	E	I	I	P	R	C
A	0.100	0.125	0.061	0.125	0.061	0.100	0.100	0.061	0.125	0.125	0.061	0.061	0.100
B	0.008	0.000	0.028	0.000	0.028	0.008	0.008	0.028	0.000	0.000	0.028	0.028	0.008
C	0.026	0.004	0.045	0.004	0.045	0.026	0.026	0.045	0.004	0.004	0.045	0.045	0.026
D	0.036	0.008	0.050	0.008	0.050	0.036	0.036	0.050	0.008	0.008	0.050	0.050	0.036
E	0.130	0.350	0.054	0.350	0.054	0.130	0.130	0.054	0.350	0.350	0.054	0.054	0.130
F	0.014	0.001	0.035	0.001	0.035	0.014	0.014	0.035	0.001	0.001	0.035	0.035	0.014
...	...	...	...	...	...	...	...	...	...	...	...	...	...
W	0.010	0.000	0.031	0.000	0.031	0.010	0.010	0.031	0.000	0.000	0.031	0.031	0.010
X	0.000	0.000	0.004	0.000	0.004	0.000	0.000	0.004	0.000	0.000	0.004	0.004	0.000
Y	0.009	0.000	0.029	0.000	0.029	0.009	0.009	0.029	0.000	0.000	0.029	0.029	0.009
Z	0.000	0.000	0.002	0.000	0.002	0.000	0.000	0.002	0.000	0.000	0.002	0.002	0.000

This post is ongoing, I will edit it some more when I get a chance.

Friday 6 December 2013

MMSE Estimators for Speech Enhancement

MMSE (Minimum Mean Square Error) estimators are commonly used in speech enhancement, with many different estimators applied to different speech representations. In its most basic form, an MMSE estimator is used to find the most likely clean signal given a noisy signal. I will start by explaining a time domain MMSE estimator (which won't be very useful), and then we can move on to better ones.

Some notation: \(x_n\) is our clean signal (\(n\) is the sample index), \(y_n\) is our noisy signal, and \(d_n\) will be our noise. We'll assume our noise is additive i.e. \[ y_n = x_n + d_n \]

We will now start applying some assumptions: we assume both \(x_n\) and \(d_n\) are gaussian variables, with different variances. As an assumption, this is not a very good one. Time domain speech is not gaussian, and most real noise is also not gaussian. One of the things this assumption has going for it though is that it results in a very simple estimator.

So I should now explain what an estimator actually is. Given a probability distribution function for a random variable, we want to pick the most likely value the random variable can take. There are two main ways of doing this: MMSE estimators, and MAP estimators. I'll explain MAP estimators in another post, they tend to be simpler to compute but they don't usually perform as well as MMSE estimators.

MMSE estimators are derived as follows: given a pdf \(p(x_n|y_n)\) (this is the probability of the clean signal given the noisy signal) we wish to find \(\hat{x}_n\) (this is called "x hat"),which is our estimate of the clean signal. To do this we find the expected value of the pdf \(p(x_n|y_n)\). We use the following formula for expectation:

\[\hat{x}_n = \int_{-\infty}^{\infty}x_n p(x_n|y_n)dx_n\]

Lets look at the pdfs we know (remember we assumed both \(x\) and \(d\) are gaussian distributed):

\[p(x_n) = \dfrac{1}{\sqrt{2\pi\sigma_x^2}} \exp \left( \dfrac{-x_n^2}{2\sigma_x^2} \right) \] \[p(d_n) = \dfrac{1}{\sqrt{2\pi\sigma_d^2}} \exp \left( \dfrac{-d_n^2}{2\sigma_d^2} \right) \]

Using the 2 pdfs above, we can derive the pdf for \(p(y_n|x_n)\) (the pdf of the noisy signal given the clean signal). Since we assumed \(y_n = x_n + d_n\), if we are given the value \(x_n\), we know \(y_n\) must be close to it with a variance of \(\sigma_d^2\) i.e. the noise variance. This results in:

\[p(y_n|x_n) = \dfrac{1}{\sqrt{2\pi\sigma_d^2}} \exp \left(\dfrac{ -(y_n-x_n)^2 }{ 2\sigma_d^2 } \right) \]

Despite how easy it is to calculate \(p(y_n|x_n)\), it is not the pdf we want. We can compute \(p(x_n|y_n)\) using Bayes rule:

\[p(x_n|y_n) = \dfrac{p(y_n|x_n)p(x_n)}{\int_{-\infty}^\infty p(y_n|x)p(x) dx}\]

The value we actually want is:

\[\hat{x}_n = \int x_n p(x_n|y_n) dx_n = \int \dfrac{x_n p(y_n|x_n)p(x_n)}{\int_{-\infty}^\infty p(y_n|x)p(x) dx} dx_n\]

Now we substitute the expressions for the pdfs into our equations to get a bit of a monster:

\[\hat{x}_n = \int{ \dfrac{x_n \dfrac{1}{\sqrt{2\pi\sigma_d^2}} \exp \left(\dfrac{ -(y_n-x_n)^2 }{ 2\sigma_d^2 } \right) \dfrac{1}{\sqrt{2\pi\sigma_x^2}} \exp \left( \dfrac{-x_n^2}{2\sigma_x^2} \right)}{\int_{-\infty}^\infty \dfrac{1}{\sqrt{2\pi\sigma_d^2}} \exp \left(\dfrac{ -(y_n-x)^2 }{ 2\sigma_d^2 } \right) \dfrac{1}{\sqrt{2\pi\sigma_x^2}} \exp \left( \dfrac{-x^2}{2\sigma_x^2} \right) dx} dx_n}\]

If you want to do it by hand you can simplify the above expression a bit and use a book of integrals to solve it, but the easier way is just to use mathematica to do it for you (or this page which helps with integrals). The result is the following very elegant solution:

\[\hat{x}_n = \dfrac{\sigma_x^2}{\sigma_x^2 + \sigma_d^2}y_n \]

If \(x_n\) and \(d_n\) were actually gaussian random variables, then the estimator we derived above is optimal i.e. no better estimator of \(y_n\) exists. In reality, we would also need to know \(\sigma_d^2\) and \(\sigma_x^2\), and these can be very tricky to estimate. In the next post I will use the same method but different probability distributions to derive other estimators.

In the next post I'll do a slightly better MMSE estimator that doesn't use gaussians. By using more complicated pdfs we can improve our estimator, but we will also get much more complicated integrals. In some cases the integrals won't be analytically solvable at all, so we'll have to be happy approximating them numerically. If this is not possible because of e.g. timing constraints, we'll have to settle on a MAP estimator.

Pages