Future Work - 多層次極限學習機於語音訊號處理上的應用

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

works is assessed using several objective and subjective evaluation metrics, namely PESQ, STOI, SDI, SSNRI, Cep, LLR, SRMR, FeSSNR, HASPI, and MUSHRA, by consider

ing matched and mismatch testing conditions under different SNR and RIR levels. Such evaluation metrics are directly correlated with the speech quality, intelligibility, and per

ceived amount of noise/reverberation of the estimated speech signals. The performance is compared against stateoftheart deep neural and conventional speech signal processing

based algorithms to verify their effectiveness and robustness. The proposed frameworks performed exceptionally well and exhibited better performance by resulting in less dis

continues signal at the output. While working with deep neural structures, it is required to find a sufficient (generally large) amount of training data to generalize well and capture the relationship among the diverse features. However, HELM frameworks attempt to learn spectral mapping using a limited amount of training data and offer better universal approxi

mation capability, which can effectively resolve the data requirement problem. Moreover, the fixed characteristic and tight temporal requirements enable HELMbased frameworks to land in the hardware implementation arena to obtain efficient classification/regression ability, usually implementing the forward phase. The results further demonstrate the great potential and applicability of HELMbased solutions in realtime situations where the data arrives in a sequential stream and exhibit dynamically changing and nonstationary envi

ronments which can also be very useful.

7.2 Future Work

The experimental findings in this thesis have demonstrated the great potential of HELM

based solutions for speech signal processing when there is a relatively limited amount of training data available. We will continue to leverage the characteristic of the HELM in

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

our future research and explore the applicability in model adaptation and selfsupervised learning scenarios under extreme conditions. Multitask learning and transfer learning ap

proaches have recently been adopted to improve the performances of deep learning mod

els. Moving forward, we will adopt these two approaches in a new research study, to investigate the compatibility of the proposed HELM frameworks and achieve further im

provements in the performance. Moreover, we intend to propose HELMbased noise and SNRaware training criteria to effectively enhance noise and reverberation suppression ca

pabilities. Apart from the satisfactory performance achieved by the proposed audioonly frameworks for speech signal processing, the multimodal frameworks explicitly demon

strated better behavior by relying more on the visual information and gained some guid

ance while handling low SNRs and making a decision. Based on these findings, this may be another interesting direction in which a reinforcement learning technique can be used to gain more insight into the behavior of the multimodal system under low SNR and mismatch noise conditions. Subsequently, model compression strategy alongside resid

ual and highway connections can be used to enrich the information for spectral mapping and further reduce computational costs. We will also examine the applicability of atten

tion mechanisms for the HELM structures to focus more on the essential frames in noisy and reverberant signals. In our future work, we will also be trying to concentrate on the situation where more training data is available.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

Bibliography

[1] J. Benesty, S. Makino, and J. Chen, Speech Enhancement. New York, USA:

Springer, 2005.

[2] J. Li, L. Deng, R. HaebUmbach, and Y. Gong, Robust Automatic Speech Recog

nition: A Bridge to Practical Applications. Academic Press, 2015.

[3] B. Li, Y. Tsao, and K. C. Sim, “An investigation of spectral restoration algorithms for deep neural networks based noise robust speech recognition.,” in Proc. INTER

SPEECH, pp. 3002–3006, 2013.

[4] A. ElSolh, A. Cuhadar, and R. Goubran, “Evaluation of speech enhancement tech

niques for speaker identification in noisy environments,” in Proc. ISMW, pp. 235–

239, IEEE, 2007.

[5] J. Li, L. Yang, J. Zhang, Y. Yan, Y. Hu, M. Akagi, and P. C. Loizou, “Compar

ative intelligibility investigation of singlechannel noisereduction algorithms for chinese, japanese, and english,” J. Acoust. Soc. Am., vol. 129, no. 5, pp. 3291–

3301, 2011.

[6] F. Yan, A. Men, B. Yang, and Z. Jiang, “An improved rankingbased feature enhancement approach for robust speaker recognition,” IEEE Access, vol. 4, pp. 5258–5267, 2016.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

[7] J. Li, S. Sakamoto, S. Hongo, M. Akagi, and Y. Suzuki, “Twostage binaural speech enhancement with wiener filter for highquality speech communication,” Speech Commun., vol. 53, no. 5, pp. 677–689, 2011.

[8] T. Venema, Compression for clinicians. Delmar Pub, 2006.

[9] H. Levit, “Phd, noise reduction in hearing aids: An overview,” J. Rehabil. Res. Dev.

[10] Y.H. Lai, F. Chen, S.S. Wang, X. Lu, Y. Tsao, and C.H. Lee, “A deep denoising autoencoder approach to improving the intelligibility of vocoded speech in cochlear implant simulation,” IEEE Trans. Biomed. Eng., vol. 64, no. 7, pp. 1568–1578, 2017.

[11] F. Chen, Y. Hu, and M. Yuan, “Evaluation of noise reduction methods for sentence recognition by mandarinspeaking cochlear implant listeners,” Ear and hearing, vol. 36, no. 1, pp. 61–71, 2015.

[12] P. Scalart et al., “Speech enhancement based on a priori signal to noise estimation,”

in Proc. ICASSP, vol. 2, pp. 629–632, IEEE, 1996.

[13] E. Hänsler and G. Schmidt, Topics in acoustic echo and noise control: selected methods for the cancellation of acoustical echoes, the reduction of background noise, and speech processing. Springer Science & Business Media, 2006.

[14] J. Chen, J. Benesty, Y. Huang, and E. Diethorn, “Fundamentals of noise reduction in spring handbook of speech processing,” Springer, 2008.

[15] R. McAulay and T. Quatieri, “Speech analysis/synthesis based on a sinusoidal rep

resentation,” IEEE Trans. Acoust., Speech, Signal Process., vol. 34, no. 4, pp. 744–

754, 1986.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

[16] T. F. Quatieri and R. J. McAulay, “Shape invariant timescale and pitch modification of speech,” IEEE Trans. Signal Process., vol. 40, no. 3, pp. 497–510, 1992.

[17] J. Makhoul, “Linear prediction: A tutorial review,” Proceedings of the IEEE, vol. 63, no. 4, pp. 561–580, 1975.

[18] S. Suhadi, C. Last, and T. Fingscheidt, “A datadriven approach to a priori SNR es

timation,” IEEE Trans. Audio, Speech, Language Process., vol. 19, no. 1, pp. 186–

195, 2011.

[19] T. Lotter and P. Vary, “Speech enhancement by map spectral amplitude estima

tion using a supergaussian speech model,” EURASIP Journal on Applied Signal Processing, vol. 2005, pp. 1110–1126, 2005.

[20] U. Kjems and J. Jensen, “Maximum likelihood based noise covariance matrix esti

mation for multimicrophone speech enhancement,” in Proc. EUSIPCO, pp. 295–

299, IEEE, 2012.

[21] R. McAulay and M. Malpass, “Speech enhancement using a softdecision noise suppression filter,” IEEE Trans. Acoust., Speech, Signal Process., vol. 28, no. 2, pp. 137–145, 1980.

[22] Y.C. Su, Y. Tsao, J.E. Wu, and F.R. Jean, “Speech enhancement using generalized maximum a posteriori spectral amplitude estimator,” in Proc. ICASSP, pp. 7467–

7471, IEEE, 2013.

[23] R. Frazier, S. Samsam, L. Braida, and A. Oppenheim, “Enhancement of speech by adaptive filtering,” in Proc. ICASSP, vol. 1, pp. 251–253, IEEE, 1976.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

[24] Y. Ephraim, “Statisticalmodelbased speech enhancement systems,” Proceedings of the IEEE, vol. 80, no. 10, pp. 1526–1555, 1992.

[25] B. Atal and M. Schroeder, “Predictive coding of speech signals and subjective error criteria,” IEEE Trans. Acoust., Speech, Signal Process., vol. 27, no. 3, pp. 247–254, 1979.

[26] L. Rabiner and B. Juang, “An introduction to hidden markov models,” ieee assp magazine, vol. 3, no. 1, pp. 4–16, 1986.

[27] C.T. Lin, “Singlechannel speech enhancement in variable noiselevel environ

ment,” IEEE Transactions on Systems, Man, and CyberneticsPart A: Systems and Humans, vol. 33, no. 1, pp. 137–143, 2003.

[28] C. F. Stallmann and A. P. Engelbrecht, “Gramophone noise detection and recon

struction using time delay artificial neural networks,” IEEE Transactions on Sys

tems, Man, and Cybernetics: Systems, vol. 47, no. 6, pp. 893–905, 2017.

[29] J. Tchorz and B. Kollmeier, “SNR estimation based on amplitude modulation anal

ysis with applications to noise suppression,” IEEE Transactions on Speech and Audio Processing, vol. 11, no. 3, pp. 184–192, 2003.

[30] S. Tamura, “An analysis of a noise reduction neural network,” in Proc. ICASSP, pp. 2001–2004, IEEE, 1989.

[31] F. Xie and D. Van Compernolle, “A family of mlp based nonlinear spectral estima

tors for noise reduction,” in Proc. ICASSP, vol. 2, pp. II–53, IEEE, 1994.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

[32] E. A. Wan and A. T. Nelson, “Networks for speech enhancement,” Handbook of neural networks for speech processing. Artech House, Boston, USA, vol. 139, p. 1, 1999.

[33] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” science, vol. 313, no. 5786, pp. 504–507, 2006.

[34] D. Erhan, Y. Bengio, A. Courville, P.A. Manzagol, P. Vincent, and S. Bengio,

“Why does unsupervised pretraining help deep learning?,” Journal of Machine Learning Research, vol. 11, no. Feb, pp. 625–660, 2010.

[35] B. Xia and C. Bao, “Speech enhancement with weighted denoising autoencoder.,”

in Proc. INTERSPEECH, pp. 3444–3448, 2013.

[36] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.A. Manzagol, “Stacked de

noising autoencoders: Learning useful representations in a deep network with a lo

cal denoising criterion,” Journal of Machine Learning Research, vol. 11, no. Dec, pp. 3371–3408, 2010.

[37] A. L. Maas, Q. V. Le, T. M. O’Neil, O. Vinyals, P. Nguyen, and A. Y. Ng, “Recurrent neural networks for noise reduction in robust asr,” in Proc. ICASSP, 2012.

[38] M. Wöllmer, Z. Zhang, F. Weninger, B. Schuller, and G. Rigoll, “Feature enhance

ment by bidirectional lstm networks for conversational speech recognition in highly nonstationary noise,” in Proc. ICASSP, pp. 6822–6826, IEEE, 2013.

[39] M. Z. Uddin, M. M. Hassan, A. Almogren, A. Alamri, M. Alrubaian, and G. Fortino, “Facial expression recognition utilizing local directionbased robust features and deep belief network,” IEEE Access, vol. 5, pp. 4525–4536, 2017.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

[40] S. W. Akhtar, S. Rehman, M. Akhtar, M. A. Khan, F. Riaz, Q. Chaudry, and R. Young, “Improving the robustness of neural networks using ksupport norm based adversarial training,” IEEE Access, vol. 4, pp. 9501–9511, 2016.

[41] S.W. Fu, Y. Tsao, X. Lu, and H. Kawai, “Raw waveformbased speech enhance

ment by fully convolutional networks,” in Proc. APSIPA, pp. 6–12, 2017.

[42] Y. Kim and B. Toomajian, “Hand gesture recognition using microdoppler signa

tures with convolutional neural network,” IEEE Access, vol. 4, pp. 7125–7130, 2016.

[43] Y. Wang and D. Wang, “Towards scaling up classificationbased speech separa

tion,” IEEE Trans. Audio, Speech, Language Process., vol. 21, no. 7, pp. 1381–

1390, 2013.

[44] N. Wang, M. J. Er, and M. Han, “Generalized singlehidden layer feedforward networks for regression problems,” IEEE Transactions on Neural Networks and Learning Systems, vol. 26, no. 6, pp. 1161–1176, 2015.

[45] Y. Xu, J. Du, L.R. Dai, and C.H. Lee, “A regression approach to speech enhance

ment based on deep neural networks,” IEEE/ACM Trans. Audio, Speech, Language Process., vol. 23, no. 1, pp. 7–19, 2015.

[46] X. Feng, Y. Zhang, and J. Glass, “Speech feature denoising and dereverberation via deep autoencoders for noisy reverberant speech recognition,” in Proc. ICASSP, pp. 1759–1763, 2014.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

[47] J. Li, L. Deng, Y. Gong, and R. HaebUmbach, “An overview of noiserobust auto

matic speech recognition,” IEEE/ACM Trans. Audio, Speech, Language Process., vol. 22, no. 4, pp. 745–777, 2014.

[48] S. M. Siniscalchi and V. M. Salerno, “Adaptation to new microphones using ar

tificial neural networks with trainable activation functions,” IEEE Trans. Neural Netw. Learn. Syst., vol. 28, no. 8, pp. 1959–1965, 2017.

[49] Q. Jin, T. Schultz, and A. Waibel, “Farfield speaker recognition,” IEEE Audio, Speech, and Language Process., vol. 15, no. 7, pp. 2023–2032, 2007.

[50] X. Zhao, Y. Wang, and D. Wang, “Robust speaker identification in noisy and re

verberant conditions,” IEEE/ACM Trans. Audio, Speech and Language Process., vol. 22, no. 4, pp. 836–845, 2014.

[51] S. O. Sadjadi and J. H. Hansen, “Hilbert envelope based features for robust speaker identification under reverberant mismatched conditions,” in Proc. ICASSP, pp. 5448–5451, 2011.

[52] K. Kokkinakis, O. Hazrati, and P. C. Loizou, “A channelselection criterion for suppressing reverberation in cochlear implants,” J. Acoust. Soc. Am., vol. 129, no. 5, pp. 3221–3232, 2011.

[53] O. Hazrati, S. Omid Sadjadi, P. C. Loizou, and J. H. Hansen, “Simultaneous sup

pression of noise and reverberation in cochlear implants using a ratio masking strat

egy,” The Journal of the Acoustical Society of America, vol. 134, no. 5, pp. 3759–

3765, 2013.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

[54] J. Benesty, M. M. Sondhi, and Y. Huang, Springer handbook of speech processing.

Springer, 2007.

[55] B. W. Gillespie, H. S. Malvar, and D. A. Florêncio, “Speech dereverberation via maximumkurtosis subband adaptive filtering,” in Proc. ICASSP, vol. 6, pp. 3701–

3704, 2001.

[56] K. Kinoshita, M. Delcroix, T. Nakatani, and M. Miyoshi, “Suppression of late re

verberation effect on speech signal using longterm multiplestep linear prediction,”

IEEE Trans. Audio, Speech, Language Process., vol. 17, no. 4, pp. 534–545, 2009.

[57] T. Nakatani, T. Yoshioka, K. Kinoshita, M. Miyoshi, and B.H. Juang, “Speech dereverberation based on variancenormalized delayed linear prediction,” IEEE Trans. Audio, Speech, Language Process., vol. 18, no. 7, pp. 1717–1731, 2010.

[58] T. Nakatani, M. Miyoshi, and K. Kinoshita, “Singlemicrophone blind dereverber

ation,” in Speech Enhancement, pp. 247–270, Springer, 2005.

[59] H. Attias, J. C. Platt, A. Acero, and L. Deng, “Speech denoising and dereverberation using probabilistic models,” in Proc. NIPS, pp. 758–764, 2001.

[60] J.T. Chien and Y.C. Chang, “Bayesian learning for speech dereverberation,” in Proc. MLSP, pp. 1–6, 2016.

[61] D. Bees, M. Blostein, and P. Kabal, “Reverberant speech enhancement using cep

stral processing,” in Proc. ICASSP, pp. 977–980, 1991.

[62] K. Lebart, J.M. Boucher, and P. Denbigh, “A new method based on spectral sub

traction for speech dereverberation,” Acta Acoustica, vol. 87, no. 3, pp. 359–366, 2001.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

[63] M. Miyoshi and Y. Kaneda, “Inverse filtering of room acoustics,” IEEE Trans.

Acoust., Speech, Signal Process., vol. 36, no. 2, pp. 145–152, 1988.

[64] J. Flanagan, J. Johnston, R. Zahn, and G. Elko, “Computersteered microphone arrays for sound transduction in large rooms,” J. Acoust. Soc. Am., vol. 78, no. 5, pp. 1508–1518, 1985.

[65] J. L. Flanagan, A. C. Surendran, and E.E. Jan, “Spatially selective sound capture for speech and audio processing,” Speech Commun., vol. 13, no. 12, pp. 207–222, 1993.

[66] T. J. Cox, F. Li, and P. Darlington, “Extracting room reverberation time from speech using artificial neural networks,” Jour. Audio Eng. Soc., vol. 49, no. 4, pp. 219–230, 2001.

[67] J. Qi, J. Du, S. M. Siniscalchi, and C.H. Lee, “A theory on deep neural net

work based vectortovector regression with an illustration of its expressive power in speech enhancement,” IEEE/ACM Trans. Audio, Speech, Language Process., vol. 27, no. 12, pp. 1932–1943, 2019.

[68] T. Ishii, H. Komiyama, T. Shinozaki, Y. Horiuchi, and S. Kuroiwa, “Reverberant speech recognition based on denoising autoencoder.,” in Proc. INTERSPEECH, pp. 3512–3516, 2013.

[69] Z. Zhang, J. Pinto, C. Plahl, B. Schuller, and D. Willett, “Channel mapping us

ing bidirectional long shortterm memory for dereverberation in handsfree voice controlled devices,” IEEE Trans. Consum. Electron., vol. 60, no. 3, pp. 525–533, 2014.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

[70] F. Weninger, S. Watanabe, J. Le Roux, J. Hershey, Y. Tachioka, J. Geiger, B. Schuller, and G. Rigoll, “The merl/melco/tum system for the reverb challenge using deep recurrent neural network feature enhancement,” in Proc. REVERB Work

shop, 2014.

[71] A. Schwarz, C. Huemmer, R. Maas, and W. Kellermann, “Spatial diffuseness fea

tures for dnnbased speech recognition in noisy and reverberant environments,” in Proc. ICASSP, pp. 4380–4384, 2015.

[72] K. Han, Y. Wang, D. Wang, W. S. Woods, I. Merks, and T. Zhang, “Learning spec

tral mapping for speech dereverberation and denoising,” IEEE/ACM Trans. Audio, Speech, Language Process., vol. 23, no. 6, pp. 982–992, 2015.

[73] X. Xiao, S. Zhao, D. H. H. Nguyen, X. Zhong, D. L. Jones, E. S. Chng, and H. Li,

“Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation,” EURASIP J. Adv. Signal Process., vol. 2016, no. 1, p. 4, 2016.

[74] B. Wu, K. Li, M. Yang, and C.H. Lee, “A reverberationtimeaware approach to speech dereverberation based on deep neural networks,” IEEE/ACM Trans. Audio, Speech, Language Process., vol. 25, no. 1, pp. 102–111, 2017.

[75] D. S. Williamson and D. Wang, “Timefrequency masking in the complex domain for speech dereverberation and denoising,” IEEE/ACM Trans. Audio, Speech, Lan

guage Process., vol. 25, no. 7, pp. 1492–1501, 2017.

[76] T. Shinamura and T. Tomikura, “Quality improvement of boneconducted speech,”

in Proc. ECCTD, vol. 3, pp. III–73, IEEE, 2005.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

[77] Z. Zhang, Z. Liu, M. Sinclair, A. Acero, L. Deng, J. Droppo, X. Huang, and Y. Zheng, “Multisensory microphones for robust speech detection, enhancement and recognition,” in Proc. ICASSP, vol. 3, pp. iii–781, IEEE, 2004.

[78] Y. Zheng, Z. Liu, Z. Zhang, M. Sinclair, J. Droppo, L. Deng, A. Acero, and X. Huang, “Airand boneconductive integrated microphones for robust speech de

tection and enhancement,” in 2003 IEEE Workshop on Automatic Speech Recogni

tion and Understanding (IEEE Cat. No. 03EX721), pp. 249–254, IEEE, 2003.

[79] M. Graciarena, H. Franco, K. Sonmez, and H. Bratt, “Combining standard and throat microphones for robust speech recognition,” IEEE Signal Processing Letters, vol. 10, no. 3, pp. 72–74, 2003.

[80] T. V. Thang, K. Kimura, M. Unoki, and M. Akagi, “A study on restoration of bone

conducted speech with mtfbased and lpbased models,” Journal of signal process

ing, 2006.

[81] Y. Tajiri, H. Kameoka, and T. Toda, “A noise suppression method for body

conducted soft speech based on nonnegative tensor factorization of airand body

conducted signals,” in Proc. ICASSP, pp. 4960–4964, IEEE, 2017.

[82] J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, and A. Y. Ng, “Multimodal deep learning,” in Proc. ICML, pp. 689–696, 2011.

[83] Y. Mroueh, E. Marcheret, and V. Goel, “Deep multimodal learning for audiovisual speech recognition,” in Proc. ICASSP, pp. 2130–2134, 2015.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

[84] S. Tamura, H. Ninomiya, N. Kitaoka, S. Osuga, Y. Iribe, K. Takeda, and S. Hayamizu, “Audiovisual speech recognition using deep bottleneck features and highperformance lipreading,” in Proc. APSIPA, pp. 575–582, 2015.

[85] J.C. Hou, S.S. Wang, Y.H. Lai, Y. Tsao, H.W. Chang, and H.M. Wang, “Audio

visual speech enhancement using multimodal deep convolutional neural networks,”

IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 2, no. 2, pp. 117–128, 2018.

[86] A. Gabbay, A. Shamir, and S. Peleg, “Visual speech enhancement,” in Proc. IN

TERSPEECH, pp. 1170–1174, 2018.

[87] D. Michelsanti, Z.H. Tan, S. Sigurdsson, and J. Jensen, “Effects of lombard re

flex on the performance of deeplearningbased audiovisual speech enhancement systems,” in Proc. ICASSP, pp. 6615–6619, 2019.

[88] A. Ephrat, I. Mosseri, O. Lang, T. Dekel, K. Wilson, A. Hassidim, W. T. Freeman, and M. Rubinstein, “Looking to listen at the cocktail party: A speakerindependent audiovisual model for speech separation,” arXiv preprint arXiv:1804.03619, 2018.

[89] K. Hwang and W. Sung, “Fixedpoint feedforward deep neural network design us

ing weights+ 1, 0, and 1,” in Proc. SiPS, pp. 1–6, 2014.

[90] R. Prabhavalkar, O. Alsharif, A. Bruguier, and L. McGraw, “On the compression of recurrent neural networks with an application to LVCSR acoustic modeling for embedded speech recognition,” in Proc. ICASSP, pp. 5970–5974, 2016.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

[91] Y.T. Hsu, Y.C. Lin, S.W. Fu, Y. Tsao, and T.W. Kuo, “A study on speech en

hancement using exponentonly floating point quantized neural network (EOFP

QNN),” in Proc. SLT, pp. 566–573, 2018.

[92] R. Livni, S. ShalevShwartz, and O. Shamir, “On the computational efficiency of training neural networks,” in Proc. NIPS, pp. 855–863, 2014.

[93] L. Perez and J. Wang, “The effectiveness of data augmentation in image classifica

tion using deep learning,” arXiv preprint arXiv:1712.04621, 2017.

[94] T. Hussain, S. M. Siniscalchi, C.C. Lee, S.S. Wang, Y. Tsao, and W.H. Liao,

“Experimental study on extreme learning machine applications for speech enhance

ment,” IEEE Access, vol. 5, pp. 25542–25554, 2017.

[95] G.B. Huang, Q.Y. Zhu, and C.K. Siew, “Extreme learning machine: theory and applications,” Neurocomputing, vol. 70, no. 1, pp. 489–501, 2006.

[96] Z. Huang, Y. Yu, J. Gu, and H. Liu, “An efficient method for traffic sign recognition based on extreme learning machine,” IEEE Trans. Cybern., vol. 47, no. 4, pp. 920–

933, 2017.

[97] J. Tang, C. Deng, and G.B. Huang, “Extreme learning machine for multilayer perceptron,” IEEE transactions on neural networks and learning systems, vol. 27, no. 4, pp. 809–821, 2016.

[98] F. Sun, C. Liu, W. Huang, and J. Zhang, “Object classification and grasp planning using visual and tactile sensing,” IEEE Transactions on Systems, Man, and Cyber

netics: Systems, vol. 46, no. 7, pp. 969–979, 2016.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

[99] L. L. C. Kasun, H. Zhou, G.B. Huang, and C. M. Vong, “Representational learning with ELMs for big data,” 2013.

[100] W. Zhao, T. H. Beach, and Y. Rezgui, “Optimization of potable water distribution and wastewater collection networks: A systematic review and future research di

rections,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 46, no. 5, pp. 659–681, 2016.

[101] D. Wang, L. Bischof, R. Lagerstrom, V. Hilsenstein, A. Hornabrook, and G. Hornabrook, “Automated opal grading by imaging and statistical learning,”

IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 46, no. 2, pp. 185–201, 2016.

[102] N. Wang, M. J. Er, and M. Han, “Parsimonious extreme learning machine using recursive orthogonal least squares,” IEEE Trans. Neural Netw. Learn. Syst., vol. 25, no. 10, pp. 1828–1841, 2014.

[103] D. Liu, Q. Wei, and P. Yan, “Generalized policy iteration adaptive dynamic pro

gramming for discretetime nonlinear systems,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 45, no. 12, pp. 1577–1591, 2015.

[104] G.B. Huang, H. Zhou, X. Ding, and R. Zhang, “Extreme learning machine for regression and multiclass classification,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 42, no. 2, pp. 513–529, 2012.

[105] D. Pearce and J. Picone, “Aurora working group: Dsr front end lvcsr evaluation au/384/02,” Inst. for Signal & Inform. Process., Mississippi State Univ., Tech. Rep, 2002.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

[106] D. D. Lee and H. S. Seung, “Algorithms for nonnegative matrix factorization,” in Advances in neural information processing systems, pp. 556–562, 2001.

[107] L. Finesso and P. Spreij, “Nonnegative matrix factorization and idivergence al

ternating minimization,” Linear Algebra and its Applications, vol. 416, no. 23, pp. 270–287, 2006.

[108] X. Lu, Y. Tsao, S. Matsuda, and C. Hori, “Speech enhancement based on deep denoising autoencoder.,” in Proc. INTERSPEECH, pp. 436–440, 2013.

[109] J. Martens, “Deep learning via hessianfree optimization,” in Proc. ICML, pp. 735–

742, 2010.

[110] G.B. Huang, L. Chen, C. K. Siew, et al., “Universal approximation using in

cremental constructive feedforward networks with random hidden nodes,” IEEE Trans. Neural Networks, vol. 17, no. 4, pp. 879–892, 2006.

[111] A. Beck and M. Teboulle, “A fast iterative shrinkagethresholding algorithm for linear inverse problems,” SIAM journal on imaging sciences, vol. 2, no. 1, pp. 183–

202, 2009.

[112] I.T. Recommendation, “Perceptual evaluation of speech quality (PESQ): An ob

jective method for endtoend speech quality assessment of narrowband telephone networks and speech codecs,” Rec. ITUT P. 862, 2001.

[113] S. Quackenbush, T. Barnwell, and M. Clements, “Objective measures of speech

在文檔中多層次極限學習機於語音訊號處理上的應用 - 政大學術集成 (頁 151-178)