Sunday 30 November 2014

Framing and reconstructing speech signals

This post will deal with framing and overlap-add resynthesis. This can also be known as AMS (Analysis-Modification-Synthesis) when doing things like speech enhancement. First of all, what is the point of framing? An audio signal is constantly changing, so we assume that on short time scales the audio signal doesn't change much (when we say it doesn't change, we mean statistically i.e. statistically stationary, obviously the samples are constantly changing on even short time scales). This is why we frame the signal into 20-40ms frames. If the frame is much shorter we don't have enough samples to get a reliable spectral estimate, if it is longer the signal changes too much throughout the frame, and the FFT will end up smearing the contents of the frame.

What is involved: frame the speech signal into short, overlapping frames. Typically frames are taken to be about 20ms long. For a 16kHz sampled audio file, this corresponds to 0.020s * 16,000 samples/s = 400 samples in length. We then use an overlap of 50%, or about 200 samples. This means the first frame starts at sample 0, the second starts at sample 200, the third at 400 etc.

MATLAB code for framing: frame_sig.m and unframing:deframe_sig.m.

Framing the signal is pretty simple, the only thing to note is that the signal is padded with zeros so that it makes an integer number of frames. A window function is also applied. The overlap-add process has a few things that make it tricky, as well as adding up the overlapped signal we also add up the window correction which is basically what our signal would be if every frame was just the window. This is important since the windowed frames won't necessarily add up to get the original signal back. You can see this by plotting the window_correction variable in deframe_sig.m and thinking about how it gets like that. We also have to add eps (this is just a very small constant i.e. epsilon) to the window correction just in case it is ever zero, this prevents infs appearing in our reconstructed signal.

To see how the AMS framework can be used for spectral subtraction, have a look at this spectral subtraction tutorial. The framing and deframing routines on this page can be used to implement the enhancement routines there. Some example code for the tutorial above would look something like this:


  1. A portion of the online speech union administrations are paid however there are some which are free or with free preliminaries. text to voice mp3

  2. I have some LPCC coefficents....But how can i reconstracted signal using this coefficents?

  3. How come 400 samples here? ,it must be 320.

  4. I loved your post. this is really informative to everyone. we have Online Speech Therapy Programs through which you can improve your child's reading comprehension..!!

  5. Wow this blog is awesome. wish to see this much more like this. Thanks for sharing your information. get one of the best Virtual Speech Therapy then visit on our website.

  6. Thank you for sharing such a Magnificent post here. I found this blog very useful for future references. keep sharing such informative blogs with us. framing contractors

  7. I think you forgot to put the imaginary number "j" in line 15, it should be:
    reconstructed_frames = ifft(sqrt(clean_spec).*exp(1j*phase),NFFT,2);


  8. really infomative and eduvcative article thanks publisher for sharing this info with us massage gaming chair with footrest
    whatsaup flippzilla