Chapter 3. Noise Reduction Algorithm with Entropy-Based Voice Activity
3.6. Filter Bank-Based Spectral Subtraction
3.6.1. Introduction
After VAD process, we get the information of current frame based on which the decision of different noise reduction steps could be taken. There are four different states for noise reduction, which are Voiced-Zone, Too-Short Voiced-Zone, Voice-Protection-Zone, and Silence-Zone. The flow to decide which state to enter is shown in Fig. 3-10 and is described below,
a. If a frame is judged as VAD = 1 and the previous 8 VAD results are also 1, that is VAD_cnt > 8, then the state will be Voiced-Zone. VAD_cnt = VAD_cnt + 1.
b. If a frame is judged as VAD = 1 but is not the case of a, that is, VAD_cnt 8 then the state will be Too-Short Voiced-Zone. VAD_cnt = VAD_cnt + 1.
c. If a frame is judged as VAD = 0, but VAD_cnt 3, which means it’s just right
28
after some Voiced-Zone, then the state will be Voice-Protection-Zone. VAD_cnt
= VAD_cnt - 3.
d. If a frame is none of any of the above conditions, the state will be Silence-Zone.
“Noise Estimation” will be performed.
The data sample will be processed under one of the states mentioned above. For case a and c, “Spectral Subtraction for Speech” will be performed. For case b and d,
“Spectral Attenuation for Noise” will be performed. Finally, the output data sample will be generated and feeds to the Insertion Gain block.
Fig. 3-10 State decision after VAD
3.6.2. Noise Estimation
If Silence-Zone (case d) state is entered, the “Noise Estimation” will be performed.
Each time when “Noise Estimation” is activated, the data sample in each subband will be averaged respectively. Also, there will be an increment in a counter. When the counter reaches 128, that is, the noise estimation has been performed for 128 times, the noise estimation result value is updated by averaging over the past 128 noise data
29
that were averaged in each subband. Then the counter will be reset to zero. The process iterates again and again to ensure the noise estimation result value is up-to-date in order to provide appropriate information for spectral subtraction process.
The result of noise estimation will be the reference noise signal in (3-19), (3-20) and (3-21).
3.6.3. Spectral Attenuation for Noise
For case d, after noise estimation, the data samples that enter Noised Zone state will be attenuated. For case b, the data samples that enter Too-Short Voiced-Zone will also be attenuated because they are tend to be noise since their VAD period is too short.
The mechanism of Silence-Zone is simply attenuating each data sample in every subband by multiplying 0.125 to them, i.e.:
, ,
0.125 ( 3-16)
Where , is the kth processed subband data sample in band i, i = F22~F39 . , is the kth input subband data sample in band i, i = F22~F39. Note that the rage of k is dependent on which subband it is.
And the mechanism of Too-Short Voiced-Zone is also simply attenuating each data sample in every subband by multiplying 0.25 to them, i.e.:
, ,
0.25 ( 3-17)
30
Then the output data samples are acquired.
3.6.4. Spectral Subtraction for Speech
From 3.6.1, if Voiced-Zone (case a) state or Voice-Protection-Zone (case c) state is entered, the spectral subtraction for speech will be performed in order to eliminate unwanted noise from speech signal. To do the spectral subtraction, we first express the speech signal in time domain that is corrupted by noise as:
( 3-18)
Where is the noisy speech data sample, is the original speech data sample. is the noise data sample. After filtering by filter bank, the relationship may be expressed by the following equation:
, ,
| | ( 3-19)
Where , is the kth subband noisy speech data sample in band i, i = F22~F39.
, is the kth subband original speech data sample in band i. is the subband noise data sample in band i which is estimated in 3.6.2. Note that the rage of k is dependent on which subband it is.
To simplify the calculation, we can estimate the clean speech by the following equation:
31
, ,
| | ( 3-20)
Now we want to estimate the speech data sample by the following equation:, ,
| | ( 3-21)
is a constant and varies in different subband, and is defined as:
2.5, 22~ 30
1, 31~ 39 ( 3-22)
The reason is different over subbands is that for band frequency lower than F30, the energy is closer to human voice, while for frequency higher than F30, it may be unwanted noise during speech. Thus we apply a higher subtraction factor to those subbands.
Next, from [25], to avoid negative values resulting from (3-21), , is floored as follows:
,
,
,
, ,,
, ( 3-23)
Where is set to 0.05.Finally, to mask musical noise, a small amount of noisy data samples are added to the processed data samples, i.e.:
32
, , ,
( 3-24)
So the processed data sample in speech period is:
, , ,
( 3-25)
Where the multiplication of , ensures , to have the same sign number as , .
From the above equation, the output of spectral subtraction is also acquired.
3.6.5. Low-Power Hardware Optimizations for Filter
Bank-Based Spectral Subtraction
The spectral attenuation and spectral subtraction in the proposed algorithm requires constant number multiplication. In order to reduce the complexity and circuit area, the multiplication is approximated by linear combining the multiplicand by the factor of 1/2. The approximation is applied in (3-21), (3-23), and (3-24). For example,
0.1 can be approximated by the following step:
0.1 0.09375
0.125 0.03125
3 5 ( 3-26)
33
As a result, the hardware architecture can be simplified.
3.6.6. Off Mechanism
For high SNR conditions, noise reduction process may not be necessary but will cause additional quality loss to the processed speech. Thus a simple off mechanism is applied. When the magnitude of noise estimation in each subband is lower than 0x0001 (in 16-bit format), the off counter will be added by 1. If the off counter exceeds 0x000f, the noise suppression and the spectral subtraction block will not be entered. It also benefits the reduction of calculation, decreasing the total power consumption.
3.6.7. Data Output
Finally, the output of the proposed noise reduction algorithm is then sent to
“Insertion Gain” block with the same 16-bit bitwidth format of the input data samples.
The output data samples align in the manner that is same as the input data samples to the proposed noise reduction block fed by the analysis filter bank.
3.7. Summary
In this chapter, the proposed noise reduction is introduced. The entropy-based VAD step performs on every 4 input sequences by calculating the entropy of the input sequence. The entropy calculation is hardware-optimized by log base changing and linear interpolation. The filter bank-based spectral subtraction is performed based on frequency dividing by filter bank with spectral attenuation for noise during silence period and spectral subtraction in speech period. The spectral subtraction process is
34
also hardware optimized in order to achieve low-power consumption. In addition, off mechanism is applied to reduce computation power. The simulation and analysis will be discussed in the next chapter.
35