結論與未來展望 - 運用調變頻譜分解技術於強健語音特徵擷取之研究

在本論文中，我們探討了將非負矩陣分解法應用在語音特徵的調變頻譜上的方法，

並進一步的討論如何在此方法的框架下擷取出更具有強健性的資訊。文中的貢獻主要分為兩大部分：

1. 非負矩陣分解法之延伸，延伸的方法有二種：

a. 第一部分為探討利用訓練資料分群來針對不同特性的語句分別處理，以從更細微的角度描述語音調變頻譜中重要的局部資訊。當應用分群技術，同時使用廣域資訊與局部資訊時，對語音辨識的正確率有相當顯著的改進，而在結合 CMVN 之後，甚至可以達到超越

AFE 的辨識效果。本論文也討論了不同分群個數的選取方式，結果顯示當群數取過多時會使辨識結果大幅下降，因為當群數過多時會

產生訓練資料不足之情況。

b. 第二部分為討論非負矩陣分解法的稀疏化，針對基底矩陣的每一行或每一列進行稀疏化的處理，結果顯示對每一行稀疏化的效果是較佳而且明顯對語音辨識有助益的。單獨使用稀疏化非負矩陣分解法之效果較原始非負矩陣分解法改進了約 10%。本論文也針對以稀疏性非負矩陣分解法為基礎之調變頻譜正規化法中稀疏比的參數進行探討，可發現稀疏比愈低，基底向量間的冗餘性愈高，而稀疏比愈高，基底向量間重覆的資訊便會較少。從辨識的結果來看在稀疏比過高或過低之情況皆會使辨識結果下降，需進行實驗性的調整取得最佳參數。

2. 壓縮感知法之延伸，本論文提出一個展新的想法，將壓縮感知法應用至調變頻譜，利用少量較相關訓練資料的線性組合來還原資料。

從實驗結果中也證明了在調變頻譜域中低頻之部分是包含較多語音辨識重要之部分，故若非負矩陣分解法之基底向量多分布在低頻部分，且在高頻部份沒有雜訊干擾，會明顯有助於提升辨識率。

在未來展望方面，可以分二個方面討論：

1. 對訓練資料進行額外的處理，在訓練資料中附加其他有助益之資訊。例如將訓練資料以難易程度或一致化程度分組，仿照人類學習的策略將資料由易而難排列進行訓練，以期使非負矩陣分解的相關方法能收斂到一個較為優良的解。

2. 另一方向為研究不同的稀疏化非負矩陣分解法，討論不同種類的稀疏化限制式──如使用 L1-norm、L2-norm 及 L0-norm 之間的優缺點，並嘗試運用在較大量的資料上進行驗證。

參考文獻

[1] J. Benesty, M. Sondhi and Y. Huang, “Springer Handbook of Speech Processing,”

2008.

[2] J. Tabrikian, G. S. Fostick, and H. Messer, “Detection of Environmental Mismatch in a Shallow Water Waveguide,” IEEE Transactions on Signal Processing, Vol. 47, No. 8, pp. 2181–2190, 1999.

[3] J. P. M. Schalkwijk and T. Kailath, “A Coding Scheme for Additive Noise Channels with Feedback-Part I: No Bandwidth Constraint,” IEEE Transactions on Information Theory, Vol. IT-12, No.12, pp. 183–189, 1966.

[4] V. Stouten, H. V. Hamme and P. Wambacq, “Joint Removal of Additive and Convolutional Noise with Model-Based Feature Enhancement, “in Proceeding of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vol. I, pp. 949–952, 2004.

[5] H. G. Hirsch and C. Ehrlicher, “Noise Estimation Techniques for Robust Speech Recognition,” in Proceeding of IEEE International Conference Acoustics, Speech, Signal Processing, Vol. 1, pp. 153–156, 1995.

[6] Y. Lv and C.-X. Zhai, “Positional Language Models for Information Retrieval,

“ in Proceedings of the ACM SIGIR conference on Research and development in information retrieval (SIGIR), pp. 299–306, 2009.

[7] J. Mark and F. Gales, “Acoustic Modelling for Speech Recognition: Hidden Markov Models and Beyond?” in Proceedings of Automatic Speech Recognition

& Understanding, pp. 44, 2009.

[8] J. Droppo, “Tutorial of International Conference on Spoken Language Processing,

“in Proceedings of International Speech Communication Association (INTERSPEECH), 2008.

[9] S.F Boll, “Supperssion of Acouststic Noise in Speech Using Spectral Subtraction,”

IEEE Transactions on Acoustics, Speech , and Signal Processing,Vol. 27, No. 2, pp. 113–120, 1979.

[10] P. Lockwood and J. Boudy, “Experiments with a Nonlinear Spectral Subtractor(NSS), Hidden Markov Models and The Projection, for Roubst Speech Recognition in Car, “ Speech Communication Vol. 11, No. 2-3, pp. 215–228, 1992.

[11] S. Fruri, “Cepstral Analysis Techniques for Automatic Speaker Verification,”

IEEE Transaction on Acoustic, Speech and Signal Processing, Vol. 29, pp.

254–272, 1981.

[12] V. Olli and K. Laurila, “Cepstral Domain Segmental Feature Vector Normalization for Noise Robust Speech Recognition,” Speech Communication, Vol. 25, pp. 113–147, 1998.

[13] S. Yoshizawa, N. Hayasaka, N. Wada and Y. Miyanaga, “Cepstral Gain Normalization for Noise Robust Speech Recognition, “in Proceedings of International Conference on Acoustics, Speech and Signal Processing(ICASSP), Vol. 1, pp. I-209–I-212, 2004.

[14] F. Hilger and H. Ney, “Quantile Based Histogram Equalization for Noise Robust Large Vocabulary Speech Recognition, “IEEE Transaction On Audio, Speech and Language Processing,Vol. 1, pp. I-209–I-212, 2006.

[15] A. Torre, A. M. Peinado, J. C. Segura, J. L. Perez- Cordoba, M. C. Benitez and A.

J. Rubio, “Histogram Equalization of Speech Representation for Robust Speech Recognition, “ IEEE Transaction Speech Audio Processing,Vol. 13, No. 3, pp.

355–366, 2005.

[16] S.-H. Lin, H.-B. Chen, Y.-M. Yeh and B. Chen, ”Improved Histogram Equalzaiton (HEQ) for Robust Speech Recogntion,” in Proceedings of IEEE International Conference on Multimedia and Expo(ICME), pp. 2234–2237, 2007.

[17] A. P. Varga and R. K. Moore, “Hidden Markov Model Decomposition of Speech and Noise,” in Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP) , pp. 845-848, 1990.

[18] M. J. F. Gales, “Model-Based Techniques for Noise Robust Speech Recognition,”

Ph. D. thesis, University of Cambridge, UK, 1995.

[19] C. J. Leggetter and P. C. Woodland, “Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models,“ Computer Speech and Language, Vol. 9, pp. 171–185, 1995.

[20] J.-L. Gauvain and C.-H. Lee, “Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains, “ IEEE Transactions on Speech and Audio Processing, Vol. 2, No. 2, pp. 291–298, 1994.

[21] M. Cooke, P. Green, L. Josifovski and A. Vizinho, “Robust Automatic Speech Recognition With Missing and Unreliable Acoustic Data,” Speech Communication, Vol. 34, No.3, pp. 267–285, 2001.

[22] M. P. Cooke, A. Morris, and P. D. Green, “Missing Data Techniques For Robust Speech Recognition,” in Proceeding of International Conference on Acoustics, Speech and Signal Processing(ICASSP) , pp. 863–866, 1997.

[23] B. Raj, “Reconstruction of Incomplete Spectrograms for Robust Speech Recognition,” Ph. D. dissertation, ECE Department, Carnegie Mellon University, Pittsburgh, 2000.

[24] H. Hermansky, “Perceptual Linear Predictive (PLP) Analysis of Speech,

“ Journal of the Acoustical Society of America, Vol. 87, No 4, pp. 1738–1752, 1991.

[25] S. B. Davis and P. Mermelstein, “Comparison of Parametric Representations for Monosyllaic Word Recognition in Comtinuously Spoken Sentences,“ IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. 28, No. 4, pp.

357–366, 1980.

[26] R. Drullman, J. M. Festen, and R. Plomp, “Effect of Temporal Envelope Smearing on Speech Reception,“ The Journal of the Acoustical Society of America, Vol. 95, No. 2, pp. 1053–1064, 1994.

[27] R. Drullman, J. M. Festen, and R. Plomp, “Effect of Reducing Slow Temporal Modulations on Speech Reception,“ The Journal of the Acoustical Society of America, Vol. 95, pp. 2670–2680, 1994.

[28] H. Hermansky, “Should Recognizers Have Ears?,“ Speech Communication, Vol.

25, No.1–3, pp. 3–27, 1998.

[29] N. F. Viemeister, “Temporal Modulation Transfer Functions Based Upon Modulation Thresholds,” Journal of the Acoustical Society of America, Vol. 66, pp. 1364–1380, 1979.

[30] B. Kollmeier, and R. Koch, “Speech Enhancement Based on Physiological and Psychoacoustical Models of Modulation Perception,” Journal of the Acoustical Society of America, Vol. 95, pp. 1593–1602, 1994.

[31] S. Greenberg, “On the Origins of Speech Intelligibility in The Real World, “in Proceedings of European Speech Communication Association (ESCA)–NATO Tutorial and Research Workshop on Robust Speech Rocognition for Unknown Communication Channels, pp. 23–32, 1997.

[32] S. van Vuuren and H. Hermansky, “On the Importance of Components of the Modulation Spectrum for Speaker Verification,” in Proceedings of the International Conference on Spoken Language Processing(ICSLP), Sydney, Australia, Vol. 7, pp. 3205–3208, 1998.

[33] Y. Wada, K. Yoshida, T. Suzuki, H. Mizuiri, K. Konishi, K. Ukon, K.Tanabe, Y.

Sakata and M. Fukushima, “Synergistic Effects of Docetaxel And S-1 by Modulating The Expression Of Metabolic Enzymes Of 5-fluorouracil in Human Gastric Cancer Cell Lines, “ International Journal of Cancer, Vol. 119, pp. 783–

791, 2006.

[34] L.-C. Sun, , C.-W. Hsu, and L.-S. Lee, “Modulation Spectrum Equalization for Robust Speech Recognition,“ in Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding(ASRU), pp. 81–86, 2007.

[35] X. Xiao, E.- S. Chng and H. Li, “Normalizing the Speech Modulation Spectrum for Robust Speech Recognition,” in Proceedings of International Conference on Acoustics , Speech and Signal Processing (ICASSP), pp.1021–1024, 2007.

[36] S.-Y. Huang, W.-H. Tu and J.-W. Hung, “A Study of Sub-band Modulation Spectrum Compensation for Robust Speech Recognition,“ in Proceedings of ROCLING Conference on Computational Linguistics and Speech Processing, pp.

39–52, 2009.

[37] B. Chen, W.-H. Chen, S.-H. Lin, and W.-Y. Chu, “Robust Speech Recognition Using Spatial–Temporal Feature Distribution Characteristics,” Pattern Recognition Letters, Vol. 32, No. 7, pp. 919–926, 2011.

[38] J.-W. Hung, W.-H. Tu and C.-C. Lai, “Improved Modulation Spectrum Enhancement Methods for Robust Speech Recognition,” Signal Processing, Vol.

92, No. 11, pp. 2791–2814, 2012.

[39] J.-W. Hung, H.-T. Fan and Y.-C. Lian, “Modulation Spectrum Exponential Weighting for Robust Speech Recognition,” in Proceedings of International Conference on ITS Telecommunications, pp. 812–816, 2012.

[40] S. Ghwanmeh, R. Al-Shalabi and G. Kanaan, “Efficient Data Compression Scheme using Dynamic Huffman Code Applied on Arabic Language,” Journal of Computer Science, Vol. 2, No. 12, pp. 885–888, 2006.

[41] J. Bradbury, “Linear Predictive Coding,” 2000.

[42] D. D. Lee and H. S. Seung. “Learning the Parts of Objects by Non-negative Matrix Factorization. “ Nature, Vol.401, pp. 788–791, 1999.

[43] A. Hyvarinen, J. Karhunen and E. Oja. “Independent Component Analysis,“ Wiley Interscience, Vol. 13, No. 4–5, pp. 411–430, 2001.

[44] R. O. Duda, and P. E. Hatr, “Pattern Classification and Scene Analysis,” Wiley, 1 edition, 1973.

[45] N. Kumar, Investigation of Silicon-Auditory Models and Generalization of Linear Discriminant Analysis for Improved Speech Recognition, Ph.D. dissertation, Johns Hopkins University, Baltimore, MD, 1997.

[46] M. J. Gales, “Maximum Likelihood Multiple Subspace Projections for Hidden Markov Models, “ IEEE Transaction Speech Audio Processing, Vol. 10, No. 2, pp. 37–47, 2002.

[47] G. Saon, M. Padmanabhan, R. Gopinath, and S. Chen, “Maximum Likelihood Discriminant Feature Spaces, “ in Proceedings of International Conference on Acoustics, Speech, and Signal Processing(ICASSP), pp. 129–132, 2000.

[48] D. D. Lee and H. S. Seung, “Algorithms for Nonnegative Matrix Factorization,”

in Advances in Neural Information Processing Systems ,Vol. 13, pp. 556–562 2001.

[49] W.-Y. Chu, J.-W. Hung and B. Chen, “Modulation Spectrum Factorization for Robust Speech Recognition,” in Proceedings of APSIPA Annual Summit and Conference (APSIPA ASC), pp. 18–21, 2011.

[50] K. Kimura and T. Yoshida, “Topic Graph based Transfer Learning via Generalized KL Divergence Based NMF,” in Proceedings of IEEE International Conference on Granular Computing, pp. 330–335, 2011.

[51] D. Cai, X. He, X. Wang, H. Bao and J. Han. “Locality Preserving Nonnegative Matrix Factorization,” in Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), pp. 1010–1015, 2009.

[52] L. Zhang, Z. Chen, M. Zheng and X. He. “Robust Non-negative Matrix Factorization. “ Frontiers of Electrical and Electronic Engineering, Vol. 6, No. 2, pp. 192–200, 2011.

[53] H.-T. Fan, Y.-C. Tsai and J.-W. Hung, “Enhancing the Sub-band Modulation Spectra of Speech Features via Nonnegative Matrix Factorization for Robust Speech Recognition,“ in Proceedings of IEEE International Conference on System Science and Engineering (ICSSE), pp. 179–182, 2012.

[54] P. O. Hoyer, “Non-negative Matrix Factorization with Sparseness Constraints,”

Journal of Machine Learning Research, Vol. 5, pp. 1457–1469, 2004.

[55] M. Mørup, K. H. Madsen and L. K. Hansen. “Approximate 𝐿 Constrained Non-negative Matrix and Tensor Factorization, “in Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS), special session, pp.

1328–1331, 2008.

[56] R. Peharz, M. Stark and F. Pernkopf, “Sparse Nonnegative Matrix Factorization Using 0 Constraints, “in Proceedings of IEEE International Workshop on Machine Learning for Signal Processing(MLSP), pp. 83–88, 2010.

[57] W.-S. Zheng, S.-Z. Li, J.-H. Lai, and S. Liao. “On Constrained Sparse Matrix Factorization, “in Proceedings of IEEE International Conference Computer Vision, pp. 1–8, 2007.

[58] T. Cai, G. Xu and J. Zhang, “On Recovery of Sparse Signals Via L1 Minimization, “ IEEE Transactions on Information Theory, Vol. 55, No. 7, pp.

3388–3397, 2009.

[59] J. Emmanuel, K. Justin and T. Terence, “Stable Signal Recovery from Incomplete and Inaccurate Measurements,“ Communications on Pure and Applied Mathematics, Vol. 59, No. 8, pp. 1207–1223, 2006.

[60] D. L. Donoho, “Compressed Sensing, “ IEEE Transactions on Information Theory, Vol. 52, No. 4, pp. 1289–1306 , 2006.

[61] E. Candès, J. Romberg and T. Tao, “Robust Uncertainty Principles: Exact Signal Reconstruction from Highly Incomplete Frequency Information,” IEEE Transactions Information Theory, Vol. 52, No. 2, pp. 489–509, 2006.

[62] H. G. Hirsch and D. Pearce, “The AURORA Experimental Framework for the Performance Evaluations of Speech Recognition Systems Under Noisy Conditions, “ in Proceeding of International Symposium on Computer Architecture Tutorial and Research Workshop Automatic Speech Recognition, 2000.

[63] S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, X. Liu, G. Moore, J.

Odell, D. Ollason, D. Povey, V. Valtchev and P. Woodland, The HTK Book (for version 3.4), Cambridge University Engineering Department, 2009.

附錄

附錄一：非負矩陣分解法之公式推導

有了非負矩陣分解法的概念後，我們將詳細討論其中更新法則是如何推導而來的。

(1) 更新法則的證明

首先我們有了最小平方法則的減損函式，我們可以把使用歐氏距離的H更新法則改寫成：

𝐻_𝑖𝑚 ← 𝐻_𝑖𝑚+ 𝜂_𝑖𝑚[ 𝑊^𝑇𝑉 _𝑖𝑚 𝑊^𝑇𝑊𝐻 _𝑖𝑚] 附 )

當𝜂_𝑖𝑚為一個很小的正值的時候，式(3-3)可視為一個常規的梯度下降法，且當𝜂_𝑖𝑚值小到接近 0 時，減損函式也可簡化為‖𝑉 𝑊𝐻‖。當我們設𝜂_𝑖𝑚為：

𝜂_𝑖𝑚 𝐻_𝑖𝑚

𝑊^𝑇𝑊𝐻 _𝑖𝑚 附將式(3-6)代入式(3-5)即可得式(3-4)之H更新式。

𝐻_𝑖𝑚 ← 𝐻_𝑖𝑚+ 𝐻_𝑖𝑚

𝑊^𝑇𝑊𝐻 _𝑖𝑚[ 𝑊^𝑇𝑉 _𝑖𝑚 𝑊^𝑇𝑊𝐻 _𝑖𝑚]

𝐻_𝑖𝑚 ← 𝐻𝑊^𝑇𝑊𝐻 _𝑖𝑚+ 𝐻𝑊^𝑇𝑉 _𝑖𝑚 𝐻𝑊^𝑇𝑊𝐻 _𝑖𝑚

𝑊^𝑇𝑊𝐻 _𝑖𝑚

𝐻_𝑖𝑚 ← 𝐻𝑊^𝑇𝑉 _𝑖𝑚

𝑊^𝑇𝑊𝐻 _𝑖𝑚

𝐻_𝑖𝑚 ← 𝐻_𝑖𝑚 𝑊^𝑇𝑉 _𝑖𝑚

𝑊^𝑇𝑊𝐻 _𝑖𝑚 同理我們也可以由減損函式，把使用歐氏距離的W更新法則改寫成：

𝑊_𝑛𝑖 ← 𝑊_𝑛𝑖+ 𝜂_𝑛𝑖[ 𝑉𝐻^𝑇 _𝑛𝑖 𝑊𝐻𝐻^𝑇 _𝑛𝑖] 附 3 當𝜂_𝑖𝑚值小到接近 0 時，減損函式也可簡化為‖𝑉 𝑊𝐻‖。當我們設𝜂_𝑖𝑚為：

𝜂_𝑛𝑖 𝑊_𝑛𝑖

𝑊𝐻𝐻^𝑇 _𝑛𝑖 附 4

將式(4-8)代入式(4-5)即可得式(4-4)之W更新式。

𝑊_𝑛𝑖 ← 𝑊_𝑛𝑖+ 𝑊_𝑛𝑖

𝑊𝐻𝐻^𝑇 _𝑛𝑖 [ 𝑉𝐻^𝑇 _𝑛𝑖 𝑊𝐻𝐻^𝑇 _𝑛𝑖] 𝑊_𝑛𝑖 ← 𝑊𝐻𝐻^𝑇 _𝑛𝑖+ 𝑊_𝑛𝑖𝑉𝐻^𝑇 _𝑛𝑖 𝑊𝐻𝐻^𝑇 _𝑛𝑖

𝑊𝐻𝐻^𝑇 _𝑛𝑖 𝑊_𝑛𝑖 ←^(𝑊_𝑊𝐻𝐻^𝑛𝑖^𝑉𝐻_𝑇^𝑇⁾^𝑛𝑖

𝑛𝑖 𝑊_𝑛𝑖 ← 𝑊_𝑛𝑖_𝑊𝐻𝐻^(𝑉𝐻^𝑇_𝑇⁾^𝑛𝑖

𝑛𝑖

因為我們選的𝜂_𝑖𝑚並不是一個很小的值，所以不定符合梯度下降法遞降的法則，接下來我們就來進一步探討這個假設的𝜂_𝑖𝑚是否適用。

(2) 證明是否收斂

要證明更新法則(也就是式(3-4))，我們需要定義一個輔助函式，在這裡我們是使用類似最大期望演算法(Expectation-Maximization algorithm)。我們定義一個輔助函式，這個輔助函式是為了要幫我們估測實際函式而設的。

ℎ, ℎ^′ 為實際函式 ℎ 的輔助函式當以下條件成立的時候：

ℎ, ℎ^′ ≥ ℎ , ℎ, ℎ ℎ 附

輔助函式和實際函式的關係如下圖附 1-1：

圖附 1-1：輔助函式與實際函式示意圖， ℎ, ℎ^′ 為輔助函式， ℎ 為實際函數。

由圖附 1-1 中我們可以看出，當函式中當 h 增時，它的值是愈來愈小的，也就是說愈靠近極小值，它們的關係可寫為：

ℎ^{𝑡 1} 𝑟𝑔 min

ℎ ℎ, ℎ^𝑡 附

iii 我們定一個正半定矩陣(positive semi-definiteness matrix)：

𝑀_𝑎𝑏 ℎ^𝑡 ℎ_𝑎^𝑡 𝐾 ℎ^𝑡 𝑊^𝑇𝑊 _𝑎𝑏ℎ_𝑏^𝑡 附 4 當𝑀為正半定矩陣，即𝐾 ℎ^𝑡 𝑊^𝑇𝑊為正半定矩陣。

𝜈^𝑇Mν ∑ 𝜈_𝑎𝑀_𝑎𝑏𝜈_𝑏

𝑎𝑏

附

∑ ℎ_𝑎𝑏 _𝑎^𝑡 𝑊^𝑇𝑊 _𝑎𝑏ℎ_𝑏^𝑡𝜈_𝑎² 𝜈_𝑎ℎ_𝑎^𝑡 𝑊^𝑇𝑊 _𝑎𝑏ℎ_𝑏^𝑡𝜈_𝑏 附 𝑊^𝑇𝑊 _𝑎𝑏ℎ_𝑎^𝑡ℎ_𝑏^𝑡 [ 𝜈_𝑎²+ 𝜈_𝑏² 𝜈_𝑎𝜈_𝑏] 附 7

𝑊^𝑇𝑊 _𝑎𝑏ℎ_𝑎^𝑡ℎ_𝑏^𝑡 𝜈_𝑎 𝜈_𝑏 ² 附

≥ 0 附 9 故 ℎ, ℎ^′ ≥ ℎ 得證。

附錄二：稀疏化非負矩陣分解法之詳細演算法，與其詳細說明演算法一：

演算法一：稀疏化非負矩陣分解法之演算法。

(1) 初始化W和H。

(2) 當要使W稀疏化時，將向量矩陣經過投影成非負的向量矩陣，固定 L2 norm，

調整 L1 norm 來達到我們期望的稀疏比。

(3) 當要使H稀疏化時，將向量矩陣經過投影成非負的向量矩陣，縮小 L2 norm 成單位向量，調整 L1 norm 來達成我們要的稀疏比。

(4) 迭代

a、若要使W稀疏化，

i. 設W W 𝜇_𝑊 𝑊𝐻 𝑉 𝐻^𝑇。

ii. 將向量矩陣經過投影成非負的向量矩陣，固定 L2 norm，調整 L1 norm 來達到我們期望的稀疏比。

若W不稀疏化，使用原始非負矩陣更新法，即𝑊_𝑛𝑖 ← 𝑊_𝑐𝑛𝑖_𝑊𝐻𝐻^(𝑉𝐻^𝑇_𝑇⁾^𝑛𝑖

𝑛𝑖 。 b、若要使H稀疏化,，

i. 設H H 𝜇_𝐻𝑊^𝑇 𝑊𝐻 𝑉 。

ii. 將向量矩陣經過投影成非負的向量矩陣，縮小 L2 norm 成單位向量，調整 L1 norm 來達成我們要的稀疏比。

若H不稀疏化，使用原始非負矩陣更新法，即𝐻_𝑖𝑚 ← 𝐻_𝑐𝑖𝑚_𝑊^(𝑊_𝑇^𝑇_𝑊𝐻^𝑉)^𝑖𝑚

𝑖𝑚。

在演算法一的步驟(1)是先對基底矩陣和權重矩陣初始化，步驟(2)和(3)是說明對基底矩陣和權重矩陣稀疏化是要怎麼做，步驟(4)就開始迭代求取稀疏化且非負的基底矩陣和權重矩陣，步驟(4-a)是如果要對基底矩陣做稀疏化的動作時，就要執行此部分，首先先將基底矩陣的值設為以(4-a-i)的方式計算，此公式是從對減損函式微分而得，接下來步驟(4-a-ii)則是要針對基底矩陣做稀疏化和檢查是否為非負的動作，此步驟詳細的過

在文檔中運用調變頻譜分解技術於強健語音特徵擷取之研究 (頁 70-86)