Final Report for Introduction to Digital Speech Processing Paper surveying and reading
B95902085 王舜玄
senior @ NTU Department of Computer Science & Information Engineering
Main Paper: (full reading)
Speech Enhancement Based on Minimum Mean-Square Error Estimation and Supergaussian Priors
Martin, R. Speech and Audio Processing, IEEE Transactions on Volume 13, Issue 5, Sept. 2005 Page(s): 845 - 856
Reference Papers:(selected reading)
Optimal estimators for spectral restoration of noisy speech Porter, J. Boll, S. Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '84.
Speech enhancement based on a priori signal to noise estimation Scalart, P.; Filho, J.V.Acoustics, Speech, and Signal Processing, 1996.
ICASSP-96. Conference Proceedings., 1996 IEEE International Conference on
Volume 2, Issue , 7-10 May 1996 Page(s):629 - 632 vol. 2
Motivation:
In the advanced topics of this semester, I have learned many speech processing skills such us using mathematical equations to decompose the relative matrix, or mean variance adjust to recover from the effect of noise. The topics about noise recovery is fascinating because we are always exposed to a noisy environment. Once we want to collect some information we will be faced with the problem of noise.
This paper was published in IEEE Transaction journal. Since the former works to noise processing make some strict assumptions and have good performance, this paper provides more general case of assumptions and use more models to discuss the performance on distortion.
Introduction:
The forth speech enhancement example mentioned in class is Wiener Filtering. Taking signal and system model into consideration, class slide tells a basic additive noise model and build Wiener Filtering by using Fourier Transform to estimate clean speech from noisy speech. A
important assumption of this model is that clean signal obeys Gaussian probability density function. Another assumption is the real part and the imaginary part of distribution are statistically independent.
But these assumptions doesn't make sense. these assumptions seems to be right at condition of the Fourier Transform coefficients for clean speech spreads a much shorter and limited bound among the whole spectrum coefficients. For example, if we apply a small interval Discrete Fourier Transform that the independence turn out not to be true.
Therefore, the author provides more models than Gaussian probability density function only. This paper discuss two-side exponential Laplace and two-side Gamma density models for modeling the probability density function. Then, using these assumption models to build analytical
solutions to minimum mean-square error estimation.Now, we have 3 model choices for speech prior and 3 model choices for noise.
Therefore, there're 9 combination cases to discuss. And
the experimental result show that these two models performs less distortion than Gaussian density only.
Modeling Process:
The following of this paper take a great effort to discuss mathematical equations derived from the model assumptions. The paper listed
probability density functions of Gaussian, Laplace and Gamma model first. Then plot histograms of DCT coefficients with experimental
parameter:window size=256 frequency=8kHz. And estimate their priori SNR as well.
For another metrics, the author also applied Kullback-Leibler distance for histogram's probability density function over the above model density.
The result shows that KL-distance for Laplace density is 1/3 than
Gaussian density, KL-distance for Gamma density is 1/6 than Gaussian
density. In addition, linear combination of 0.7 Laplace density and 0.3 Gamma density makes better match.
Estimation for MMSE:
The author wrote down the formula of E{S|Y}. Where S is probability density function for speech prior and Y is detected signal. After that the author discuss 3 types of speech prior mentioned above with Gaussian noise assumption. Inserting the formula of speech prior into estimation equations and derived a long and huge formulated answer of estimation.
Oh no, 3 huge answers.
After discussion of Gaussian noise with 3 types of speech prior, then take Laplace noise and Gamma noise into consideration. Thus, we have 9 type of answers from combinators of 3 speech prior ans 3 noise. The author plot value of estimation of all these cases. In most cases, new proposed models performs better estimation values than Wiener Filter.
Experimental Discussion:
In most cases, speech prior or noise changed to Laplace or Gamma model performs better than both use Gaussian model(unchanged).
However, some cases related to Laplace model doesn't confirm to be better estimation than Gaussian model, but it provides for better residual noise quantity. Some parameters also affect the result.
Conclusion:
Although some built mathematical models are useful to many cases, we can make change on some accustomed rules that leads to more complicated theorems but better performance. Laplace and Gamma density functions are not the only modeling skills and don't guarantee a better solution for every cases. There exists more and more solutions that waiting our discovery.
Other Reflections:
I want to study some resources relating to noise adjustment at first, so I choose this paper for my final report. During the reading, I didn't feeling the paper too perplex to understand. Instead, this paper emphasize the
author's particular proposal that bring a new mathematical model for what we were used to do.
The mathematical formula for probability density functions is not easy to remember and derive. I have write down the omitted steps between formula and formula when I read the first Gaussian and Laplace density discussion. And I read from references to obtain how and why this paper make these estimations.
During reading this paper, I also want to known why such assumption can be made every time when I faced a sentence including "assume".
Why we can assume they are stationary? Why we can suppose they are independent? I try to search references for these problems but I didn't find any. Let me "assume" that these assumptions covers some special cases but not general enough. That's why researchers are needed.
The last thing I have learned from this paper is the attitude of
researching. Many students (especially in NTU) want to do a brand-new work that no other have proposed. They may be a new algorithm for a problem, a new usage of technology...etc. But many details in existing knowledge left lot to be improved that we omits. Take this paper for example. Many people researches for how to enhance noise adjustment system or how to make innovative usage of this skill, take these
assumptions for granted. But when we study existing technique or paper carefully, them left a lot to be continued. Some assumptions may be made because of the original author want to simplify it. We can discuss these little detail carefully. Generalize them, extend them or speed up the implementation of them. Many great knowledge hides in omitted details.