jimblog: Framing and reconstructing speech signals

Sunday, 30 November 2014

Framing and reconstructing speech signals

This post will deal with framing and overlap-add resynthesis. This can also be known as AMS (Analysis-Modification-Synthesis) when doing things like speech enhancement. First of all, what is the point of framing? An audio signal is constantly changing, so we assume that on short time scales the audio signal doesn't change much (when we say it doesn't change, we mean statistically i.e. statistically stationary, obviously the samples are constantly changing on even short time scales). This is why we frame the signal into 20-40ms frames. If the frame is much shorter we don't have enough samples to get a reliable spectral estimate, if it is longer the signal changes too much throughout the frame, and the FFT will end up smearing the contents of the frame.

What is involved: frame the speech signal into short, overlapping frames. Typically frames are taken to be about 20ms long. For a 16kHz sampled audio file, this corresponds to 0.020s * 16,000 samples/s = 400 samples in length. We then use an overlap of 50%, or about 200 samples. This means the first frame starts at sample 0, the second starts at sample 200, the third at 400 etc.

MATLAB code for framing: frame_sig.m and unframing:deframe_sig.m.

Framing the signal is pretty simple, the only thing to note is that the signal is padded with zeros so that it makes an integer number of frames. A window function is also applied. The overlap-add process has a few things that make it tricky, as well as adding up the overlapped signal we also add up the window correction which is basically what our signal would be if every frame was just the window. This is important since the windowed frames won't necessarily add up to get the original signal back. You can see this by plotting the window_correction variable in deframe_sig.m and thinking about how it gets like that. We also have to add eps (this is just a very small constant i.e. epsilon) to the window correction just in case it is ever zero, this prevents infs appearing in our reconstructed signal.

To see how the AMS framework can be used for spectral subtraction, have a look at this spectral subtraction tutorial. The framing and deframing routines on this page can be used to implement the enhancement routines there. Some example code for the tutorial above would look something like this:

8 comments:

Joshin30 May 2015 at 12:44
very useful. Thank you
ReplyDelete
Replies
Shawn17 February 2019 at 02:34
A portion of the online speech union administrations are paid however there are some which are free or with free preliminaries. text to voice mp3
ReplyDelete
Replies
Subrata Paul12 March 2019 at 10:08
I have some LPCC coefficents....But how can i reconstracted signal using this coefficents?
ReplyDelete
Replies
Unknown2 October 2019 at 02:02
How come 400 samples here? ,it must be 320.
ReplyDelete
Replies
Laura Bush3 October 2020 at 16:57
I loved your post. this is really informative to everyone. we have Online Speech Therapy Programs through which you can improve your child's reading comprehension..!!
ReplyDelete
Replies
aaronnssd6 November 2020 at 14:41
Wow this blog is awesome. wish to see this much more like this. Thanks for sharing your information. get one of the best Virtual Speech Therapy then visit on our website.
ReplyDelete
Replies
Empire Framing3 December 2020 at 21:00
Thank you for sharing such a Magnificent post here. I found this blog very useful for future references. keep sharing such informative blogs with us. framing contractors
ReplyDelete
Replies
David Santos11 February 2021 at 07:17
I think you forgot to put the imaginary number "j" in line 15, it should be:
reconstructed_frames = ifft(sqrt(clean_spec).*exp(1j*phase),NFFT,2);
ReplyDelete
Replies

Add comment

Pages

Sunday, 30 November 2014

Framing and reconstructing speech signals

8 comments: