未來展望 - 使用空間-時間之特徵分布資訊於強健性語音辨識之研究

未來可能的研究方向如下：

(1) 本論文所提的方法中所用到的累積密度函數皆只考慮其在時域上的分布，

而忽略了語音結構的成分。可能的做法有下列三種：

甲、使用乾淨語料所訓練的高斯混合模型於測詴時計算各個語音向量落在各個高斯分布的機率密度函數的加總做為累積密度函數。

乙、每個數字模型皆使用雙聲源語料事先訓練一個高淤混合模型，在測詴階段計算每個語音特徵向量落在所有模型之加總機率密度函

數。

丙、使用向量泰勒展開式代替雙聲源語料，在測詴階段即時估測雜訊語音之統計分布，求算每個語音向量落在該句雜訊語音統計分布下的累積密度函數。

(2) 本論文所提出的以空間與時間之特徵分布為基礎之正規化架構中，若以主成分分析為目標函數，利用雜訊語料先求得一組特徵根向量並且與最小化平方差之和法則結合，達到對該組特徵根向量為最小誤差。

(3) 本篇論文初步的實驗皆是作用在小詞彙辨識上，未來會將所發展的技術作用於大詞彙連續語音辨識（ Large Vocabulary Continuous Speech Recognition，LVCSR）上，以驗證本論文所提出的方法之效能。

。

七、參考文獻

[Acero and Stern 1990] A. Acero and R.M. Stern (1990), "Environmental Robustness in Automatic Speech Recognition," In Proc. IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP '90), Albuquerque, New Mexico, 1990.

[Acero and Stern 1991] A. Acero and R.M. Stern (1991), "Robust Speech Recognition by Normalization of the Acoustic Space," In Proc. IEEE

International Conference on Acoustic, Speech and Signal Processing (ICASSP '91), Toronto, Canada, 1991.

[Atal 1974] B. S. Atal (1974), “Effectiveness of Linear Prediction Characteristics of The Speech Wave for Automatic Speaker Identification and Verification,” J.

Acoust. Soc. Am. 55(6):1304-1312, (1974)

[Abolhassani et al. 2007] A. H. Abolhassani et al. (2007), “Speech Enhancement Using Pca and Variance of the Reconstruction Error in Distributed Speech Recognition, “ in Proc. Asru 2007.

[Acero 1990] A. Acero. (1990) “Acoustic and environmental robustness in automatic speech recognition,” PHD these, Carnegie Mellon University, Pittsburgh, PA, U.S.A., September 1990.

[Beyerlein et al. 2002] P. Beyerlein et al. (2002), "Large Vocabulary Continuous Speech Recognition of Broadcast News - The Philips/RWTH Approach," Speech Communication. 37: pp. 109-131, 2002.

[Barker et al. 2001] J. Barker et al. (2001), "Robust ASR based on Clean Speech Models: An Evaluation of Missing Data Techniques for Connected Digit

Recognition in Nois,." In Proc. Interspeech'2001 - 7th European Conference on Speech Communication and Technology(Eurospeech), Alaborg, Denmark, pp.

213-216, 2001.

[Bernard et al. 2004] A. Bernard et al. (2004), “Can Back-Ends Be More Robust than Front-Ends? Investigation over The Aurora-2 Database,” In Porc.

International Conference on Acoustics, Speech and Signal Processing, Pages 1025-1028, Montreal, Canada, May 2004.

[Benesty et al. 2008] Jacob Benesty et al., (2008) “Springer Handbook of Speech Processing,” part E, 33.3, 2008.

[Boll 1979] S.F. Boll (1979), "Supperssion of Acoutstic Noise in Speech Using

Spectral Subtraction," IEEE Transactions on Acoustics, Speech, and Signal Processing. 27(2): pp. 113-120, April, 1979.

[Chen et al. 2008] W. H. Chen, S. H. Lin and B. Chen (2008), “Exploiting

Spatial-Temporal Distribution Characteristics for Robust Speech Recognition,”

In Porc. International Conference on Spoken Language Processing, pages 2004-2007, Brisbane, Australia, Sep 2008.

[Chen et al. 2002] C. P. Chen et al. (2002), “Low-Resource Noise-Robust Feature Post-Processing on Aurora 2.0,”Interspeech’2002-7^th International

Conference on Spoken Language Processing (ICSLP), Denver, Colorado, 2002.

[Chen and Bilmes 2007] C. P. Chen and J. Bilmes (2007), “MVA Processing of Speech Features,”IEEE Transactions on Audio, Speech, and Signal Processing, vol. 15(1): pp. 257-270. 2007.

[Cooke et al. 2001] M.P. Cooke et al. (2001), “Robust Automatic Speech Recognition with Missing and Unreliable Acoustic Data,” Speech

Communication, 34(3):267-285, June 2001.

[Cooke et al. 1997] M.P. Cooke et al. (1997), “Missing Data Techniques for Robust Speech Recognition,” In Proc.International Conference on Acoustics, Speech and Signal Processing, Pages 863-866, Munich, Germany, April 1997.

[Davis 1980] S.B. Davis and P. Mermelstein (1980), "Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences," IEEE Transactions on Acoustics, Speech, and Signal Processing.

28(4): pp. 357-366, 1980.

[Droppo 2008] J. Droppo (2008). Tutorial of International Conference on Spoken Language Processing(Interspeech), 2008.

[Dharanipragada and Padmanabhan 2000] S. Dharanipragada and M. Padmanabhan (2000), "A Nonlinear Unsupervised Adaptation Technique for Speech

Recognition," In Interspeech'2000 - 6th International Conference on Spoken Language Processing(ICSLP). 2000: Beijing, China

[Duda and Hart 1973] R. O. Duda and P. E. Hart (1973), Pattern Classification and Scene Analysis, John Wiley and Sons, New York, 1973

[Duda et al. 2001] R. O. Duda, P. E. Hart and D. G. Stork, (2001), Pattern Classification, Wiley Interscience, 2001

[Deng et al. 2000] L. Deng et al. (2000), "Large Vocabulary Speech Recognition under Adverse Acoustic Environments," In Proc. Beijing, China, 2000.

[Droppo et al. 2002] J. Droppo et al. (2002), "Evaluation of SPLICE on the Aurora 2 and 3 Tasks" In Proc. Interspeech'2002 - 7th International Conference on Spoken Language Processing(ICSLP), Denver, Colorado, 2002.

[Droppo et al. 2005] J. Droppo et al. (2005), "How to Train a Discriminative Front End with Stochastic Gradient and Maximum Mututal Information,." In Proc. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU'05), San Juan, Puerto Rico 2005.

[Droppo et al. 2001] J.Droppo, L.Deng, and A. Acero. (2001) “Evaluation of the SPLICE algorithm on the Aurora2 database ,” In Proc. European Conference no Speech Communication and Technology, pages 217-220, Aalborg, Denmark, September 2001.

[Deng et al. 2001] L. Deng, A. Aero, L. Jiang, J. Droppo, and X.D. Huang.

(2001)“High-performance robust speech recognition using stereo training data,”

In Proc. International Conference on Spoken Language Processing, pages 301-304, Salt Lake City, UT, U.S.A., May 2001.

[Droppo and Acero 2005] J. Droppo and A. Acero (2005), "Maximum Mutual Information SPLICE Transform for Seen and Unseen Conditions," In Interspeech'2005 - 9th European Conference on Speech Communication and Technology(Eurospeech). 2005: Lisbon, Portugal.

[Ephraim and Van Trees 1995] Y. Ephraim and H.L. Van Trees. (1995) “A signal subspace approach for speech enhancement,” IEEE Transactions on Speech and Audio Processing, 3(4):251-266, July 1995.

[Fruri 1981] S. Fruri (1981), "Cepstral Analysis Techniques for Automatic Speaker Verification," IEEE Transaction on Acoustic, Speech and Signal Processing.

29(2): pp. 254-272, 1981.

[Gales 1995] M.J.F. Gales (1995), Model-Based Techniques for Noise Robust Speech Recognition.. PhD thesis, University of Cambridge, UK, September 1995.

[Gales 2002] M.J.F Gales (2002), “Maximum Likelihood Multiple Subspace Projections for Hidden Markov Models,” IEEE Transactions on Speech and Audio Processing, 10(2), 2002.

[Gales and Flego 2009] M.J.F. Gales and F. Flego (2009), “Combining VTS Model Compensation and Support Vector Machines,” In Proc. ICASSP, 2009.

[Gauvain and Lee 1994] J.-L. Gauvain and C.-H. Lee (1994), “Maximum A Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov

Chains,” IEEE Transactions on Speech and Audio Processing, 2(2):291-298, April 1994.

[Gibson and Gray 1989] Koo, J.D. Gibson and S.D. Gray (1989), "Filtering of Colored Noise for Speech Enhancement and Coding." In Proc. IEEE

International Conference on Acoustic, Speech and Signal Processing (ICASSP '89), Glasgow, Scotland, pp. 349-352, 1989.

[Gong 2003] Y. Gong (2003), “Model-Space Compensation of Microphone and Noise for Speaker-Independent Speech Recognition,” In Proc. International Conference on Acoustics, Speech and Signal Processing, Pages 660-663, Hong Kong, China, April 2003.

[Gopinath 1998] S. Mika. "Fisher Discriminant Analysis With Kernels." In Proc.

IEEE International Workshop on Neural Networks for Signal Processing, pp.

41-48, 1999.

[Gopinath 1998] R.A. Gopinath (1998), "Maximum likelihood modeling with Gaussian distributions," In Proc. IEEE International Conference on Acoustics, Speech, Signal processing (ICASSP '98), Washington, USA, pp. 661-664, 1998.

[Hain et al. 2005] T. Hain (2005), "Automatic Transcription of Conversational Telephone Speech," IEEE Transactions on Speech and Audio Processing. 13(6):

pp. 1173-1185, 2005.

[Hamme 2004] H.V. Hamme (2004), "Robust Speech Reocgnition Using Cepstral Domain Missing Data Techniques and Noisy Mask," In IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP '04). 2004:

Quebec, Canada.

[Hermansky and Morgan 1994] H. Hermansky and N. Morgan. (1994), "RASTA processing of speech," IEEE Transactions on Speech and Audio Processing. 2(4):

pp. 578-589, 1994

[Hermansjy 1991] H. Hermansky (1991), "Perceptual Linear Predictive (PLP) Analysis of Speech," Journal of the Acoustical Society of America. 87: pp.

1738-1752, 1991.

[Hermus and Wambacq 2004] K. Hermus and P. Wambacq. (2004) “Assessment of signal subspace based speech enhancement for noise robust speech

recognition,” In Proc. International Conference on Acoustics, Speech and Signal Processing, pages 945-948, Montreal, Canada, May 2004.

[Hilger and Ney 2006] F. Hilger and H. Ney (2006), “Quantile Based Histogram Equalization for Noise Robust Large Vocabulary Speech Recognition,” IEEE

Transaction on Audio, Speech and Language Processing, vol. 14(3):845-854, 2006

[Hilger and Ney 2001] F. Hilger and H. Ney (2001), "Quantile Based Histogram Equalization for Noise Robust Speech Recognition," In Proc. Interspeech'2001 - 7th European Conference on Speech Communication and

Technology(Eurospeech), Aalborg, Denmark, 2001.

[Hirsch and Pearce 2000] H. G. Hirsch and D. Pearce (2000), “The AURORA Experimental Framework for the Performance Evaluations for Speech

Recognition Systems under Noisy Conditions,” in Proc. ISCA ITRW ASR2000, Paris France, 2000.

[Hsu and Lee 2004] C.W. Hsu and L.S. Lee (2004), "Higher Order Cepstral Moment Normalization (HOCMN) for Robust Speech Recognition," In Proc. IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP '04), Quebec, Canada, pp. 197-200, 2004.

[Hsu and Lee 2006] C.W. Hsu and L.S. Lee (2006), "Extension and Further Analysis of Higher Order Cepstral Moment Normalization (HOCMN) for Robust Features in Speech Recognitio,." In Proc. Interspeech'2006 - 9th International Conference on Spoken Language Processing (ICSLP), Pittsburgh, Pennsylvania, pp. 41-44, 2006.

[Hung 2001] J. W. Hung et al., (2001), “Comparative Analysis for Data-Driven Temporal Filters Obtained via Principle Component Analysis (PCA) and Linear Discriminant Analysis (LDA) in Speech Recognition,” In Porc. International Conference on Spoken Language Processing, pages 1959-1962, Aalborg, Denmark, Sep 2001.

[Huang et al., 2001] X. Huang, A. Acero et al. (2001), "Spoken Language Processing: A Guide to Theory, Algorithm and System Development," Upper Saddle River, NJ, USA: Prentice Hall PTR, 2001.

[Huo and Zhu 2006] Q. Huo and D. Zhu (2006), "A Maximum Likelihood Training Approach to Irrelevant Variability Compensation Based on Piecewise Linear Transformations," In Proc. Interspeech'2006 - 9th International

Conference on Spoken Language Processing (ICSLP), Pittsburgh, Pennsylvania, pp.1129-1132, 2006.

[Josifovski et al. 1999] L. Josifovski et al. (1999), "State Based Imputation of

Missing Data for Robust Speech Recognition and Speech Enhancemen,." In Proc.

Interspeech'1999 - 6th European Conference on Speech Communication and Technology(Eurospeech), Budapest, Hungary, 1999.

[Kuhn et al. 2000] A. H. Kuhn et al. (2000), “Rapid Speaker Adaptation in Eigenvoice Space,” IEEE Transactions on Speech and Audio Processing, 8, 2000.

[Kocsor et al. 2000] A. Kocsor et al. (2000), “A Comparative Study of Several Feature Transformation and Learing Methods for Phoneme Classication,” International journal of speech technology 3, 263-276, 2000.

[Kumar 1997] N. Kumar (1997), Investigation of Silicon-Auditory Models and Generalization of Linear Discriminant Analysis for Improved Speech

Recognition, Ph. D. dissertation, John Hopkins University, Baltimore, 1997.

[Koehler et al. 1994] J. Koehler et al. (1994), "Integrating RASTAPLP into Speech Recognitio,." In Proc. IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP '94), Albuquerque, New Mexico, pp. 421-424, 1994.

[Kim et al. 1998] N. S. Kim et al. (1998), “Speech Recognition in Noisy

Environments Using First-Order Vector Taylor Series,” Speech Communication, 1998.

[Kalinli et al. 2009] O. Kalinli* et al. (2009), “Noise Adaptive Training Using A Vector Taylor Series Approach Noise Robust Automatic Speech Recognition,” In Proc. ICASSP 2009.

[Lasry and Stern 1984] M.J. Larsry and R.M. Stern (1984), “ A Posteriori Estimation of Correlated Jointly Gaussian Mean Vectors,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 6(4):530-535, July 1984.

[Leggetter and Woodland 1995] C.J. Leggetter and P.C. Woodland (1995),

“Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models,” Computer Speech and Language,

9(2):171-185, April 1995.

[Lieb and Fischer 2001] M. Lieb and A. Fischer (2001), "Experiments with the Philips Continuous ASR System on the AURORA Noisy Digits Database," In Proc. Interspeech'2001 - 7th European Conference on Speech Communication and Technology(Eurospeech), Aalborg, Denmark, 2001.

[Lin et al. 2006] S.H. Lin et al. (2006), “Exploiting Polynomial-Fit Histogram Equalization and Temporal Average for Robust Speech Recognition,” In Proc.

International Conference on Spoken Language Processing. Pittsburgh PA. USA.

September 2006.

[Lin 2007] S.H. Lin (2007),” Exploiting the Use of Data Fitting and Clustering Techniques for Robust Speech Reocgnition,” Master Thesis, National Taiwan

Normal University, Taiwan, September, 2007.

[Lockwood and Boudy 1992] P. Lockwood and J. Boudy. (1992)“Experiments with a nonlinear spectral subtractor (NSS), hidden Markov models and the projection, for robust speech recognition in cars,” Speech Communication, 11(2-3):215-228, June 1992.

[Moreno et al. 1995] P.J. Moreno, B. Rai, E. Gouvea, and R.M. Stern. (1995)

“Multivariate-gaussian-based cepstral normalization of robust speech

recognition,” In Proc. International Conference on Acoustics, Speech and Signal Processing, pages 137-140, Detroit, MI, U.S.A., May 1995.

[Moreno et al. 1996] P.J. Moreno, B. Raj, and R.M.J. Stern. (1996) “A Vector Taylor Series approach for environment-independent speech recognition,”In Proc. International Conference on Acoustics, Speech and Sinal Processong, pages 733-736, Atlanta, U.S.A., May 1996.

[Molau 2003] S. Molau (2003), “Normalization in the Acoustic Feature Space for Improved Speech Recognition,” Ph. D. Dissertation, Computer Science Department, RWTH Aachen University, Germany, 2003.

[Molau et al 2003] S. Molau et al. (2003),. "Feature Space Normalization in Adverse Acoustic Conditions," In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '03), Hong Kong, pp.

656-659, 2003.

[Molau et al. 2001] S. Molau et al. (2001), "Histogram Based Normalization in the Acoustic Feature Space," In Proc. IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU '01), Trento,Italy, pp. 21-24, 2001.

[Ｍorgan et al. 2005] N. Morgan et al., (2005), “Pushing the Envelope – Aside”

IEEE Signal Processing Magazine, vol. 22, 2005.

[Mika 1999] S. Mika (1999), “Fisher Discriminant Analysis With Kernels." In Proc.

IEEE International Workshop on Neural Networks for Signal Processing, pp.

41-48, 1999.

[Maliki and Drygajlo 1999] M. EL-Maliki and A. Drygajlo (1999), "Missing Features Detection and Handling for Robust Speaker Verification," In Proc.

Interspeech'1999 - 6th European Conference on Speech Communication and

Technology(Eurospeech), Budapest, Hungary, pp. 975-978, 1999.

[Neumeyer and. Weintraub 1994] L. Neumeyer and M. Weintraub (1994),

"Probabilistic Optimum Filtering for Robust Speech Recognition." In Proc. IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP '94), Albuquerque, New Mexico, pp. 417-420, 1994.

[Palomaki et al. 2004] K.J. Palomaki et al. (2004), "A Binaural Processor for Missing Data Speech Recognition in the Presence of Noise and Small-Room Reverberation," Speech Communication. 43(4): pp. 361-378, 2004

[Raj and Stern 2005] B. Raj and R.M. Stern (2005), "Missing-feature Approaches in Speech Recognition," Signal Processing Magazine. 22(5): pp. 101-116, 2005.

[Raj et al. 2004] B. Raj et al. (2004), "Reconstruction of Missing Features for Robust Speech Recognition," Speech Communication. 43(4): pp. 275-296, 2004.

[Raj 2000] B. Raj, Reconstruction of Incomplete Spectrograms for Robust Speech Recognition, Ph. D. dissertation, ECE Department, Carnegie Mellon University, Pittsburgh, 2000.

[Soan et al. 2006] G. Soan, S. Dharanipragada and D. Povey (2006), “Feature Space Gaussianization,” In Proc. IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP '2006), pp. 329-332, 2006.

[Soan et al. 2000] G. Soan, et al. (2000), “Maximum Likelihood Discriminant Feature Spaces,” In Proc IEEE International Conference on Acoustics, Speech, Signal processing (ICASSP '00), Istanbul, Turkey, pp. 1129-1132, 2000.

[Suk et al. 1999] Y.H. Suk et al. (1999), "Cepstrum Third-Order Normalisation Method for Noisy Speech Recognition," Electronics Letters. 35(7): pp. 527-528, 1999.

[Segura et al. 2004] J.C. Segura et al. (2004), "Cepstral Domain Segmental Nonlinear Feature Transformations for Robust Speech Recognition," IEEE Signal

Processing Letters. 11(5): pp. 517-520, 2004.

[Torre and Peinado 2005] A. D. L. Torre and A. M. Peinado (2005), “Histogram Equalization of Speech Reocgnition for Robust Speech Recognition,”IEEE Trainsactions on Acoustics, Speech and Signal Processing, 13(3):355-366, May 2005

[Torre et al. 2002] A. Torre et al., ()2002, "Non-Linear Transformations of the Feature Space for Robust Speech Recognition," In Proc. IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP '02), Orlando, Florida, pp. 401-404, 2002.

[Torre et al. 2005] A. Torre, A. M. Peinado, et al. (2005), “Non-Linear

Transformations of the Feature Space for Robust Speech Recognition,” In Proc.

IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP '2006), Orlando, Floria, 2002.

[Van hamme 2003] H. Van hamme (2003), “Robust Speech Recognition Using Missing Feature Theory in The Cepstral or Lda Domain,” In Proc. European Conference on Speech Communication and Technology, Pages 3089-3092, Geneva, Switzerland, September 2003.

[Van hamme 2004] H. Van hamme (2004), “PROSPECT Feature and Their Application to Missing Data Techniques for Robust Speech Recognition,” In Proc. International Conference on Spoken Language Processing, Pages 101-104, Jeju Island, South-Korea, October 2004.

[Vikki and Laurila 1998] A. Viikki and K. Laurila, (1998), “Cepstral Domain

Segmental Feature Vector Normalization for Noise Robust Speech Recognition,”

Speech Communication vol. 25, 1998.

[Vizinho et al. 1999] A. Vizinho et al. (1999), "Missing Data Theory, Spectral Subtraction and Signal-to-Noise estimation for Robust ASR." In Proc.

Interspeech'1999 - 6th European Conference on Speech Communication and Technology(Eurospeech), Budapest, Hungary, pp. 2407-2410, 1999.

[Varga and Moore 1990] A.P. Varga and R.K. Moore (1990), “Hidden Markov Model Decomposition of Speech and Noise,” In Porc. International Conference on Acoustics, Speech and Signal Processing, pages 845-848, Albuquerque, NM, U.S.A., April 1990.

[Wan and Lee 2006] C.Y. Wan and L.S. Lee (2006), "Joint Uncertainty Decoding (JUD) with Histogram-Based Quantization (HQ) for Robust and/or Distributed Speech Recognition," In Proc. IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP '06), Toulouse, France, pp. 125-128, 2006

[Wan and Lee 2005] C.Y. Wan and L.S. Lee (2005), "Histogram-based

Quantization (HQ) for Robust and Scalable Distributed Speech Recognition," In Proc. Interspeech'2005 - 9th European Conference on Speech Communication and Technology(Eurospeech), Lisbon, Portugal, 2005.

[Wu et al. 2005] J. Wu et al. (2005), "An Environment Compensated Maximum Likelihood Training Approach based on Stochastic Vector Mapping," In IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP '05). 2005: Philadelphia, USA.

[Wu et al. 2006] J. Wu et al. (2006), "An Environment Compensated Maximum Likelihood Training Approach based on Stochastic Vector Mapping," In IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP '05). 2005: Philadelphia, USA.

[Xu and Chin 2009] H. Xu and K.K. Chin (2009), “Joint Uncertainty Decoding with The Second Order Approximation for Noise Robust Speech Recognition,”

In Proc. ICASSP, 2009.

[Yamaguchi et al. 1997] Y. Yamaguchi et al. (1997), “Fast Adaptation of Acoustic Models to Environmental Noise Using Jacobian Adaptation Algorithm,”

In Proc. European Conference on Speech Communication and Technology, Pages 2051-2054, Rhodes, Greece, September 1997.

[Yapanel et al. 2001] U. Yapanel et al. (2001), "Robust Digit Recognition in Noise:

An Evaluation using the AURORA Corpus," In Proc. Interspeech'2001 - 7th European Conference on Speech Communication and Technology(Eurospeech), Aalborg, Denmark, 2001.

[Yoshizawa 1992] S. Yoshizawa et al., (1992), “Ceptral Gain Normalization for Noise Robust Speech Recognition,” in Proc. ICASSP 2004.

[Young et al. 2006] S. Young et al. (2006), “The HTK Book (for HTK Verson 3.4), Camgridge University, (2006).

在文檔中使用空間-時間之特徵分布資訊於強健性語音辨識之研究 (頁 95-106)