未來展望 - 結論與未來展望 - 時域封包上的雜訊消除

五. 結論與未來展望

5.2 未來展望

從上一章的實驗結果中不難發現，本篇論文所提出的時域封包調變方法仍有相當多有待改善的地方：

 會隨著不同語句輸入及不同雜訊，在性能上有大幅度變動的特性，將來必須設計出適應不同訊雜比以及不同背景雜訊的參數。

 對於 rate 分佈特性跟語音相近的背景雜訊 (例如：嘈雜人聲)，需要更能有效分辨出兩者差異的參數設定，並考慮到時間消耗的需求不使用到高維的分析處理。

 必須讓音樂雜訊 (musical noise) 殘留更少且不會因此造成語音失真，現階段的實驗中曾用通過低通濾波器對判定為非語音的音框作 smooth，

但這會讓有些誤判為非語音的音框產生語音失真，導致評估時 PESQ 分數下降，IS dist.的距離則增加。

 不須與 Wiener 濾波器結合就可達到穩定且良好的性能，或者與其他方法結合，試著在性能上有更好的突破，例如本篇論文所提之方法並未對相位 (phase)作處理，也許可與頻譜相位補償法 (phase spectrum compensate) [28]作結合。

 計算時間上仍需要盡可能地縮短，減少運算子數目以及音框數量，未來打算對時-頻單點作判定，判別為非語音部分則統一乘上一極小值遮蔽，

不須每個時-頻單點都計算其遮蔽值，如此一來將可省下不少計算時間。

參考文獻

[1] Tai-Shih Chi, Powen Ru and Shihab A. Shamma, “Multiresolution

spectrotemporal analysis of complexsounds” J. Acoust. Soc. Am., vol. 118, no. 2, pp. 887-906, 2005.

[2] H. Sheikhzadeh, R. L. Brennan and H. Sameti, “Real-time implementation of HMM-based MMSE algorithm for speech enhancement in hearing aid applications” Proc. IEEE ICASSP, pp. 808-811, 1995.

[3] Tai-Shih Chi, class notes of Auditory and Acoustical Information Processing, Department of Communication Engineering, National Chiao-Tung University, Taiwan, 2011.

[4] Nima Mesgarani and Shihab Shamma, “Denoising in the domain of

spectrotemporal modulations” EURASIP Journal on Audio, Speech, and Music Processing Volume 2007.

[5] Tai-Shih Chi, Ting-Han Lin and Chung-Chien Hsu, “Spectro-temporal modulation energy based mask for robust speaker identification” J. Acoust. Soc. Am. 131 (5), pp. 368-374, 2012.

[6] Chung-Chien Hsu, Ting-Han Lin and Tai-Shih Chi, “FFT-based spectro-temporal analysis and synthesis of sounds” Proc. IEEE ICASSP, pp. 5388-5391, 2011.

[7] Chung-Chien Hsu, Tse-En Lin, Jian-Hueng Chen and Tai-Shih Chi,

“Spectro-temporal subband wiener filter for speech enhancement” Proc. IEEE ICASSP, pp. 4001-4004, 2012.

[8] P. C. Loizou, Speech Enhancement: Theory and Practice (CRC, New York, 2007).

[9] Antony W. Rix, John G. Beerends, Michael P. Hollier and Andries P. Hekstra,

“Perceptual evaluation of speech quality (PESQ) –a new method for speech quality assessment of telephone networks and codes” Proc. IEEE Acoustics, Speech,and signal processing, pp. 749-752, 2001.

[10] Juang, B.-H., “On the Itakura-Saito measures for speech coder performance evaluation” AT&T Bell Laboratories Technical Journal, 63,8, pp. 1477-1499, 1984.

[11] S. M. Schimmel and L. E. Atlas, “Coherent envelope detection for modulation filtering of speech”

Proc. IEEE Acoustics, Speech,and signal processing, vol 1, pp. 221-224, 2005.

[12] Sofia Ben Jebara, “A perceptual approach to reduce musical noise phenomenon with wiener denoising technique” Proc. IEEE ICASSP, pp. 49-52, 2006.

[13] Md. Jahangir Alam, Sid-Ahmed Selouani and Douglas O’Shaughnessy, “An improved perceptual speech enhancement technique employing a

psychoacoustically motivated weighting factor” IEEE ASRU, pp. 266-270, 2009.

[14] A. Amehraye, D.pastor and A. Tamtaoui, “Perceptual improvement of wiener filtering” Proc. IEEE ICASSP, pp. 2081-2084, 2008.

[15] Chang Huai YOU, Soo Ngee KOH and Susanto RAHARDJA, “An MMSE speech enhancement approach incorporating masking properties” Proc. IEEE ICASSP, pp.

725-728, 2004.

[16] Firas Jabloun and Benoit Champagne, “Incorporating the human hearing properties in the signal subspace approach for speech enhancement” IEEE Trans.

Acoustics, Speech, Audio Processing, Vol. 11, No. 6, pp. 700-708, 2003.

[17] Chang Huai YOU, Susanto RAHARDJA and Soo Ngee KOH, “Perceptual kalman filtering enhancement” Proc. IEEE ICASSP, pp. 461-464, 2006.

[18] Hu, Y. and Loizou, P, “Subjective evaluation and comparison of speech enhancement algorithms” speech communication. 49, pp. 588-601, 2007.

[19] H. Hirsch, and D. Pearce, “The Aurora Experimental Framework for the

Performance Evaluation of Speech Recognition Systems under Noisy Conditions.”

ISCA ITRW ASR2000, Paris, France, pp. 18-20, 2000.

[20] Yi Hu and Philipos C. Loizou, “A perceptually Motivated Approach for speech enhancement” IEEE Trans. Acoustics, Speech, Audio Processing, Vol. 11, No. 5, pp. 457-465, 2003.

[21] Te-Won Lee and Kaisheng Yao, “Speech enhancement by perceptual filter with sequential noise parameter estimation” Proc. IEEE ICASSP, pp. 693-696, 2004.

[22] Hong You and Abeer Alwan, “Temporal modulation processing of speech signals for noise robust ASR” Proc. Interspeech, pp. 36–39, 2009.

[23] Kuen-Shian Tsai, Li-Hui Tseng, Cheng-Jung Wu and Shuenn-Tsong Young,

“Development of a mandarin monosyllable recognition test” Ear & Hearing, vol.

30, No. 1, pp. 90-99, 2009.

[24] Rainer Martin, “Noise power spectral density estimation based on optimal smoothing and minimum statistics” IEEE Tran. Acoustics, Speech, Audio

Processing, Vol. 9, No. 5, pp. 504-512, 2001.

[25] Thomas Esch and Peter Vary, “Exploiting temporal correlation of speech and noise magnitude using a modified kalman filter for speech enhancement”

ITG-Fachtagung Sprachkommunikation, pp. 8-10, 2008.

[26] Esfandiar Zavarehei and Saeed Vaseghi, “Speech Enhancement in temporal DFT trajectories using kalman filters,” Proc. of INTERSPEECH, Lisbon, Portugal, pp. 2077-2080, 2005.

[27] Thomas Esch and Peter Vary, "Speech enhancement using a modified kalman filter based on complex linear prediction and supergaussian priors, "

Proc. of ICASSP, Las Vegas, USA, pp. 4877-4880, 2008.

[28] Kamil Wojcicki, Mitar Milacic, Anthony Stark, James Lyons and Kuldip Paliwal, “Exploiting conjugate symmetry of the short-time Fourier spectrum for speech enhancement” IEEE Signal

Process. Lett., vol. 15, pp. 461–464, 2008.

[29] Stephen So, Kamil K. Wojcicki, James G. Lyons, Anthony P. Stark, Kuldip K.

Paliwal, “Kalman ﬁlter with phase spectrum compensation algorithm for speech enhancement.” Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, pp.

4405–4408. ,2009.

[30] Esfandiar Zavarehei, Saeed Vaseghi, and Qin Yan, “SPeech enhancement with kalman filtering the short-time DFT trajectories of noise and speech”

EURASIP, 2006.

[31] Afshin Rezayee and Saeed Gazor, “An adaptive KLT approach for speech enhancement” IEEE Tran. Acoustics, Speech, Audio Processing, Vol. 9, No. 2, pp.

87-95, 2001.

[32] Stephen So, Kuldip Paliwal, “Suppressing the inﬂuence of additive noise on the kalman ﬁlter gain for low residual noise speech enhancement.” Speech Commun. 53 (3), pp. 355–378, 2010.

[33] S.D. Apte and Shridhar, “An efficient speech enhancement algorithm using conjugate symmetry of DFT” Electrical Engineering and Control, LNEE 98, pp.

695-701, 2011.

[34] Ching-Ta Lu, Kun-Fu Tseng and Chih-Tsung Chen, “Reduction of residual noise using directional median filter” IEEE CSAE., pp. 475-479, 2011.

[35] Sriram Ganapathy, Samuel Thomas and Hynek Hermansky, “Temporal envelope subtraction for robust speech recognition using modulation spectrum”

IEEE ASRU., pp. 164-169, 2009.

[36] James G. Lyons and Kuldip K. Paliwal, “Eﬀect of compressing the dynamic range of the power spectrum in modulation ﬁltering based speech

enhancement.” Proc. ISCA Conf. Internat. Speech Comm. Assoc. (INTERSPEECH), pp. 387–390, 2008.

[37] Tiago H. Falk, Svante Stadler, W. Bastiaan Kleijn and Wai-Yip Chan, “Noise suppression based on extending a speech-dominated modulation band,”

Interspeech, pp. 970-973, 2007.

[38] Wen-Rong Wu and Po-Cheng Chen, “Subband kalman filtering for speech

enhancement” IEEE Tran., analog and digital signal processing, Vol. 45, No. 8, 1998.

[39] Kuldip Paliwal, Kamil Wo´jcicki, Belinda Schwerin, “Single-channel speech enhancement using spectral subtraction in the short-time modulation domain.”

Speech Comm. 52 (5), pp. 450–475.

[40] Stephen So and Kuldip K. Paliwal, “Modulation-domain kalman ﬁltering for singlechannel speech enhancement.” Speech Comm. 53 (6), pp. 818–829, 2011.

[41] Mark Marzinzik and Birger Kollmeier, “Speech pause detection for noise spectrum estimation by tracking power envelope dynamics” IEEE Tran. Acoustics, Speech, Audio Processing, Vol. 10, No. 2, pp. 109-118, 2002.

[42] Jing-Dong Chen, Jacob Benesty, Yiteng (Arden) Huang and Simon Doclo,

“New insights into the noise reduction wiener filter” IEEE Tran. Audio, Speech, Language Processing, Vol. 14, No. 4, pp. 1218-1234, 2006.

在文檔中時域封包上的雜訊消除 (頁 58-64)