國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
works is assessed using several objective and subjective evaluation metrics, namely PESQ, STOI, SDI, SSNRI, Cep, LLR, SRMR, FeSSNR, HASPI, and MUSHRA, by consider
ing matched and mismatch testing conditions under different SNR and RIR levels. Such evaluation metrics are directly correlated with the speech quality, intelligibility, and per
ceived amount of noise/reverberation of the estimated speech signals. The performance is compared against stateoftheart deep neural and conventional speech signal processing
based algorithms to verify their effectiveness and robustness. The proposed frameworks performed exceptionally well and exhibited better performance by resulting in less dis
continues signal at the output. While working with deep neural structures, it is required to find a sufficient (generally large) amount of training data to generalize well and capture the relationship among the diverse features. However, HELM frameworks attempt to learn spectral mapping using a limited amount of training data and offer better universal approxi
mation capability, which can effectively resolve the data requirement problem. Moreover, the fixed characteristic and tight temporal requirements enable HELMbased frameworks to land in the hardware implementation arena to obtain efficient classification/regression ability, usually implementing the forward phase. The results further demonstrate the great potential and applicability of HELMbased solutions in realtime situations where the data arrives in a sequential stream and exhibit dynamically changing and nonstationary envi
ronments which can also be very useful.
7.2 Future Work
The experimental findings in this thesis have demonstrated the great potential of HELM
based solutions for speech signal processing when there is a relatively limited amount of training data available. We will continue to leverage the characteristic of the HELM in
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
our future research and explore the applicability in model adaptation and selfsupervised learning scenarios under extreme conditions. Multitask learning and transfer learning ap
proaches have recently been adopted to improve the performances of deep learning mod
els. Moving forward, we will adopt these two approaches in a new research study, to investigate the compatibility of the proposed HELM frameworks and achieve further im
provements in the performance. Moreover, we intend to propose HELMbased noise and SNRaware training criteria to effectively enhance noise and reverberation suppression ca
pabilities. Apart from the satisfactory performance achieved by the proposed audioonly frameworks for speech signal processing, the multimodal frameworks explicitly demon
strated better behavior by relying more on the visual information and gained some guid
ance while handling low SNRs and making a decision. Based on these findings, this may be another interesting direction in which a reinforcement learning technique can be used to gain more insight into the behavior of the multimodal system under low SNR and mismatch noise conditions. Subsequently, model compression strategy alongside resid
ual and highway connections can be used to enrich the information for spectral mapping and further reduce computational costs. We will also examine the applicability of atten
tion mechanisms for the HELM structures to focus more on the essential frames in noisy and reverberant signals. In our future work, we will also be trying to concentrate on the situation where more training data is available.
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
Bibliography
[1] J. Benesty, S. Makino, and J. Chen, Speech Enhancement. New York, USA:
Springer, 2005.
[2] J. Li, L. Deng, R. HaebUmbach, and Y. Gong, Robust Automatic Speech Recog
nition: A Bridge to Practical Applications. Academic Press, 2015.
[3] B. Li, Y. Tsao, and K. C. Sim, “An investigation of spectral restoration algorithms for deep neural networks based noise robust speech recognition.,” in Proc. INTER
SPEECH, pp. 3002–3006, 2013.
[4] A. ElSolh, A. Cuhadar, and R. Goubran, “Evaluation of speech enhancement tech
niques for speaker identification in noisy environments,” in Proc. ISMW, pp. 235–
239, IEEE, 2007.
[5] J. Li, L. Yang, J. Zhang, Y. Yan, Y. Hu, M. Akagi, and P. C. Loizou, “Compar
ative intelligibility investigation of singlechannel noisereduction algorithms for chinese, japanese, and english,” J. Acoust. Soc. Am., vol. 129, no. 5, pp. 3291–
3301, 2011.
[6] F. Yan, A. Men, B. Yang, and Z. Jiang, “An improved rankingbased feature enhancement approach for robust speaker recognition,” IEEE Access, vol. 4, pp. 5258–5267, 2016.
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
[7] J. Li, S. Sakamoto, S. Hongo, M. Akagi, and Y. Suzuki, “Twostage binaural speech enhancement with wiener filter for highquality speech communication,” Speech Commun., vol. 53, no. 5, pp. 677–689, 2011.
[8] T. Venema, Compression for clinicians. Delmar Pub, 2006.
[9] H. Levit, “Phd, noise reduction in hearing aids: An overview,” J. Rehabil. Res. Dev.
[10] Y.H. Lai, F. Chen, S.S. Wang, X. Lu, Y. Tsao, and C.H. Lee, “A deep denoising autoencoder approach to improving the intelligibility of vocoded speech in cochlear implant simulation,” IEEE Trans. Biomed. Eng., vol. 64, no. 7, pp. 1568–1578, 2017.
[11] F. Chen, Y. Hu, and M. Yuan, “Evaluation of noise reduction methods for sentence recognition by mandarinspeaking cochlear implant listeners,” Ear and hearing, vol. 36, no. 1, pp. 61–71, 2015.
[12] P. Scalart et al., “Speech enhancement based on a priori signal to noise estimation,”
in Proc. ICASSP, vol. 2, pp. 629–632, IEEE, 1996.
[13] E. Hänsler and G. Schmidt, Topics in acoustic echo and noise control: selected methods for the cancellation of acoustical echoes, the reduction of background noise, and speech processing. Springer Science & Business Media, 2006.
[14] J. Chen, J. Benesty, Y. Huang, and E. Diethorn, “Fundamentals of noise reduction in spring handbook of speech processing,” Springer, 2008.
[15] R. McAulay and T. Quatieri, “Speech analysis/synthesis based on a sinusoidal rep
resentation,” IEEE Trans. Acoust., Speech, Signal Process., vol. 34, no. 4, pp. 744–
754, 1986.
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
[16] T. F. Quatieri and R. J. McAulay, “Shape invariant timescale and pitch modification of speech,” IEEE Trans. Signal Process., vol. 40, no. 3, pp. 497–510, 1992.
[17] J. Makhoul, “Linear prediction: A tutorial review,” Proceedings of the IEEE, vol. 63, no. 4, pp. 561–580, 1975.
[18] S. Suhadi, C. Last, and T. Fingscheidt, “A datadriven approach to a priori SNR es
timation,” IEEE Trans. Audio, Speech, Language Process., vol. 19, no. 1, pp. 186–
195, 2011.
[19] T. Lotter and P. Vary, “Speech enhancement by map spectral amplitude estima
tion using a supergaussian speech model,” EURASIP Journal on Applied Signal Processing, vol. 2005, pp. 1110–1126, 2005.
[20] U. Kjems and J. Jensen, “Maximum likelihood based noise covariance matrix esti
mation for multimicrophone speech enhancement,” in Proc. EUSIPCO, pp. 295–
299, IEEE, 2012.
[21] R. McAulay and M. Malpass, “Speech enhancement using a softdecision noise suppression filter,” IEEE Trans. Acoust., Speech, Signal Process., vol. 28, no. 2, pp. 137–145, 1980.
[22] Y.C. Su, Y. Tsao, J.E. Wu, and F.R. Jean, “Speech enhancement using generalized maximum a posteriori spectral amplitude estimator,” in Proc. ICASSP, pp. 7467–
7471, IEEE, 2013.
[23] R. Frazier, S. Samsam, L. Braida, and A. Oppenheim, “Enhancement of speech by adaptive filtering,” in Proc. ICASSP, vol. 1, pp. 251–253, IEEE, 1976.
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
[24] Y. Ephraim, “Statisticalmodelbased speech enhancement systems,” Proceedings of the IEEE, vol. 80, no. 10, pp. 1526–1555, 1992.
[25] B. Atal and M. Schroeder, “Predictive coding of speech signals and subjective error criteria,” IEEE Trans. Acoust., Speech, Signal Process., vol. 27, no. 3, pp. 247–254, 1979.
[26] L. Rabiner and B. Juang, “An introduction to hidden markov models,” ieee assp magazine, vol. 3, no. 1, pp. 4–16, 1986.
[27] C.T. Lin, “Singlechannel speech enhancement in variable noiselevel environ
ment,” IEEE Transactions on Systems, Man, and CyberneticsPart A: Systems and Humans, vol. 33, no. 1, pp. 137–143, 2003.
[28] C. F. Stallmann and A. P. Engelbrecht, “Gramophone noise detection and recon
struction using time delay artificial neural networks,” IEEE Transactions on Sys
tems, Man, and Cybernetics: Systems, vol. 47, no. 6, pp. 893–905, 2017.
[29] J. Tchorz and B. Kollmeier, “SNR estimation based on amplitude modulation anal
ysis with applications to noise suppression,” IEEE Transactions on Speech and Audio Processing, vol. 11, no. 3, pp. 184–192, 2003.
[30] S. Tamura, “An analysis of a noise reduction neural network,” in Proc. ICASSP, pp. 2001–2004, IEEE, 1989.
[31] F. Xie and D. Van Compernolle, “A family of mlp based nonlinear spectral estima
tors for noise reduction,” in Proc. ICASSP, vol. 2, pp. II–53, IEEE, 1994.
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
[32] E. A. Wan and A. T. Nelson, “Networks for speech enhancement,” Handbook of neural networks for speech processing. Artech House, Boston, USA, vol. 139, p. 1, 1999.
[33] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” science, vol. 313, no. 5786, pp. 504–507, 2006.
[34] D. Erhan, Y. Bengio, A. Courville, P.A. Manzagol, P. Vincent, and S. Bengio,
“Why does unsupervised pretraining help deep learning?,” Journal of Machine Learning Research, vol. 11, no. Feb, pp. 625–660, 2010.
[35] B. Xia and C. Bao, “Speech enhancement with weighted denoising autoencoder.,”
in Proc. INTERSPEECH, pp. 3444–3448, 2013.
[36] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.A. Manzagol, “Stacked de
noising autoencoders: Learning useful representations in a deep network with a lo
cal denoising criterion,” Journal of Machine Learning Research, vol. 11, no. Dec, pp. 3371–3408, 2010.
[37] A. L. Maas, Q. V. Le, T. M. O’Neil, O. Vinyals, P. Nguyen, and A. Y. Ng, “Recurrent neural networks for noise reduction in robust asr,” in Proc. ICASSP, 2012.
[38] M. Wöllmer, Z. Zhang, F. Weninger, B. Schuller, and G. Rigoll, “Feature enhance
ment by bidirectional lstm networks for conversational speech recognition in highly nonstationary noise,” in Proc. ICASSP, pp. 6822–6826, IEEE, 2013.
[39] M. Z. Uddin, M. M. Hassan, A. Almogren, A. Alamri, M. Alrubaian, and G. Fortino, “Facial expression recognition utilizing local directionbased robust features and deep belief network,” IEEE Access, vol. 5, pp. 4525–4536, 2017.
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
[40] S. W. Akhtar, S. Rehman, M. Akhtar, M. A. Khan, F. Riaz, Q. Chaudry, and R. Young, “Improving the robustness of neural networks using ksupport norm based adversarial training,” IEEE Access, vol. 4, pp. 9501–9511, 2016.
[41] S.W. Fu, Y. Tsao, X. Lu, and H. Kawai, “Raw waveformbased speech enhance
ment by fully convolutional networks,” in Proc. APSIPA, pp. 6–12, 2017.
[42] Y. Kim and B. Toomajian, “Hand gesture recognition using microdoppler signa
tures with convolutional neural network,” IEEE Access, vol. 4, pp. 7125–7130, 2016.
[43] Y. Wang and D. Wang, “Towards scaling up classificationbased speech separa
tion,” IEEE Trans. Audio, Speech, Language Process., vol. 21, no. 7, pp. 1381–
1390, 2013.
[44] N. Wang, M. J. Er, and M. Han, “Generalized singlehidden layer feedforward networks for regression problems,” IEEE Transactions on Neural Networks and Learning Systems, vol. 26, no. 6, pp. 1161–1176, 2015.
[45] Y. Xu, J. Du, L.R. Dai, and C.H. Lee, “A regression approach to speech enhance
ment based on deep neural networks,” IEEE/ACM Trans. Audio, Speech, Language Process., vol. 23, no. 1, pp. 7–19, 2015.
[46] X. Feng, Y. Zhang, and J. Glass, “Speech feature denoising and dereverberation via deep autoencoders for noisy reverberant speech recognition,” in Proc. ICASSP, pp. 1759–1763, 2014.
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
[47] J. Li, L. Deng, Y. Gong, and R. HaebUmbach, “An overview of noiserobust auto
matic speech recognition,” IEEE/ACM Trans. Audio, Speech, Language Process., vol. 22, no. 4, pp. 745–777, 2014.
[48] S. M. Siniscalchi and V. M. Salerno, “Adaptation to new microphones using ar
tificial neural networks with trainable activation functions,” IEEE Trans. Neural Netw. Learn. Syst., vol. 28, no. 8, pp. 1959–1965, 2017.
[49] Q. Jin, T. Schultz, and A. Waibel, “Farfield speaker recognition,” IEEE Audio, Speech, and Language Process., vol. 15, no. 7, pp. 2023–2032, 2007.
[50] X. Zhao, Y. Wang, and D. Wang, “Robust speaker identification in noisy and re
verberant conditions,” IEEE/ACM Trans. Audio, Speech and Language Process., vol. 22, no. 4, pp. 836–845, 2014.
[51] S. O. Sadjadi and J. H. Hansen, “Hilbert envelope based features for robust speaker identification under reverberant mismatched conditions,” in Proc. ICASSP, pp. 5448–5451, 2011.
[52] K. Kokkinakis, O. Hazrati, and P. C. Loizou, “A channelselection criterion for suppressing reverberation in cochlear implants,” J. Acoust. Soc. Am., vol. 129, no. 5, pp. 3221–3232, 2011.
[53] O. Hazrati, S. Omid Sadjadi, P. C. Loizou, and J. H. Hansen, “Simultaneous sup
pression of noise and reverberation in cochlear implants using a ratio masking strat
egy,” The Journal of the Acoustical Society of America, vol. 134, no. 5, pp. 3759–
3765, 2013.
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
[54] J. Benesty, M. M. Sondhi, and Y. Huang, Springer handbook of speech processing.
Springer, 2007.
[55] B. W. Gillespie, H. S. Malvar, and D. A. Florêncio, “Speech dereverberation via maximumkurtosis subband adaptive filtering,” in Proc. ICASSP, vol. 6, pp. 3701–
3704, 2001.
[56] K. Kinoshita, M. Delcroix, T. Nakatani, and M. Miyoshi, “Suppression of late re
verberation effect on speech signal using longterm multiplestep linear prediction,”
IEEE Trans. Audio, Speech, Language Process., vol. 17, no. 4, pp. 534–545, 2009.
[57] T. Nakatani, T. Yoshioka, K. Kinoshita, M. Miyoshi, and B.H. Juang, “Speech dereverberation based on variancenormalized delayed linear prediction,” IEEE Trans. Audio, Speech, Language Process., vol. 18, no. 7, pp. 1717–1731, 2010.
[58] T. Nakatani, M. Miyoshi, and K. Kinoshita, “Singlemicrophone blind dereverber
ation,” in Speech Enhancement, pp. 247–270, Springer, 2005.
[59] H. Attias, J. C. Platt, A. Acero, and L. Deng, “Speech denoising and dereverberation using probabilistic models,” in Proc. NIPS, pp. 758–764, 2001.
[60] J.T. Chien and Y.C. Chang, “Bayesian learning for speech dereverberation,” in Proc. MLSP, pp. 1–6, 2016.
[61] D. Bees, M. Blostein, and P. Kabal, “Reverberant speech enhancement using cep
stral processing,” in Proc. ICASSP, pp. 977–980, 1991.
[62] K. Lebart, J.M. Boucher, and P. Denbigh, “A new method based on spectral sub
traction for speech dereverberation,” Acta Acoustica, vol. 87, no. 3, pp. 359–366, 2001.
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
[63] M. Miyoshi and Y. Kaneda, “Inverse filtering of room acoustics,” IEEE Trans.
Acoust., Speech, Signal Process., vol. 36, no. 2, pp. 145–152, 1988.
[64] J. Flanagan, J. Johnston, R. Zahn, and G. Elko, “Computersteered microphone arrays for sound transduction in large rooms,” J. Acoust. Soc. Am., vol. 78, no. 5, pp. 1508–1518, 1985.
[65] J. L. Flanagan, A. C. Surendran, and E.E. Jan, “Spatially selective sound capture for speech and audio processing,” Speech Commun., vol. 13, no. 12, pp. 207–222, 1993.
[66] T. J. Cox, F. Li, and P. Darlington, “Extracting room reverberation time from speech using artificial neural networks,” Jour. Audio Eng. Soc., vol. 49, no. 4, pp. 219–230, 2001.
[67] J. Qi, J. Du, S. M. Siniscalchi, and C.H. Lee, “A theory on deep neural net
work based vectortovector regression with an illustration of its expressive power in speech enhancement,” IEEE/ACM Trans. Audio, Speech, Language Process., vol. 27, no. 12, pp. 1932–1943, 2019.
[68] T. Ishii, H. Komiyama, T. Shinozaki, Y. Horiuchi, and S. Kuroiwa, “Reverberant speech recognition based on denoising autoencoder.,” in Proc. INTERSPEECH, pp. 3512–3516, 2013.
[69] Z. Zhang, J. Pinto, C. Plahl, B. Schuller, and D. Willett, “Channel mapping us
ing bidirectional long shortterm memory for dereverberation in handsfree voice controlled devices,” IEEE Trans. Consum. Electron., vol. 60, no. 3, pp. 525–533, 2014.
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
[70] F. Weninger, S. Watanabe, J. Le Roux, J. Hershey, Y. Tachioka, J. Geiger, B. Schuller, and G. Rigoll, “The merl/melco/tum system for the reverb challenge using deep recurrent neural network feature enhancement,” in Proc. REVERB Work
shop, 2014.
[71] A. Schwarz, C. Huemmer, R. Maas, and W. Kellermann, “Spatial diffuseness fea
tures for dnnbased speech recognition in noisy and reverberant environments,” in Proc. ICASSP, pp. 4380–4384, 2015.
[72] K. Han, Y. Wang, D. Wang, W. S. Woods, I. Merks, and T. Zhang, “Learning spec
tral mapping for speech dereverberation and denoising,” IEEE/ACM Trans. Audio, Speech, Language Process., vol. 23, no. 6, pp. 982–992, 2015.
[73] X. Xiao, S. Zhao, D. H. H. Nguyen, X. Zhong, D. L. Jones, E. S. Chng, and H. Li,
“Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation,” EURASIP J. Adv. Signal Process., vol. 2016, no. 1, p. 4, 2016.
[74] B. Wu, K. Li, M. Yang, and C.H. Lee, “A reverberationtimeaware approach to speech dereverberation based on deep neural networks,” IEEE/ACM Trans. Audio, Speech, Language Process., vol. 25, no. 1, pp. 102–111, 2017.
[75] D. S. Williamson and D. Wang, “Timefrequency masking in the complex domain for speech dereverberation and denoising,” IEEE/ACM Trans. Audio, Speech, Lan
guage Process., vol. 25, no. 7, pp. 1492–1501, 2017.
[76] T. Shinamura and T. Tomikura, “Quality improvement of boneconducted speech,”
in Proc. ECCTD, vol. 3, pp. III–73, IEEE, 2005.
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
[77] Z. Zhang, Z. Liu, M. Sinclair, A. Acero, L. Deng, J. Droppo, X. Huang, and Y. Zheng, “Multisensory microphones for robust speech detection, enhancement and recognition,” in Proc. ICASSP, vol. 3, pp. iii–781, IEEE, 2004.
[78] Y. Zheng, Z. Liu, Z. Zhang, M. Sinclair, J. Droppo, L. Deng, A. Acero, and X. Huang, “Airand boneconductive integrated microphones for robust speech de
tection and enhancement,” in 2003 IEEE Workshop on Automatic Speech Recogni
tion and Understanding (IEEE Cat. No. 03EX721), pp. 249–254, IEEE, 2003.
[79] M. Graciarena, H. Franco, K. Sonmez, and H. Bratt, “Combining standard and throat microphones for robust speech recognition,” IEEE Signal Processing Letters, vol. 10, no. 3, pp. 72–74, 2003.
[80] T. V. Thang, K. Kimura, M. Unoki, and M. Akagi, “A study on restoration of bone
conducted speech with mtfbased and lpbased models,” Journal of signal process
ing, 2006.
[81] Y. Tajiri, H. Kameoka, and T. Toda, “A noise suppression method for body
conducted soft speech based on nonnegative tensor factorization of airand body
conducted signals,” in Proc. ICASSP, pp. 4960–4964, IEEE, 2017.
[82] J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, and A. Y. Ng, “Multimodal deep learning,” in Proc. ICML, pp. 689–696, 2011.
[83] Y. Mroueh, E. Marcheret, and V. Goel, “Deep multimodal learning for audiovisual speech recognition,” in Proc. ICASSP, pp. 2130–2134, 2015.
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
[84] S. Tamura, H. Ninomiya, N. Kitaoka, S. Osuga, Y. Iribe, K. Takeda, and S. Hayamizu, “Audiovisual speech recognition using deep bottleneck features and highperformance lipreading,” in Proc. APSIPA, pp. 575–582, 2015.
[85] J.C. Hou, S.S. Wang, Y.H. Lai, Y. Tsao, H.W. Chang, and H.M. Wang, “Audio
visual speech enhancement using multimodal deep convolutional neural networks,”
IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 2, no. 2, pp. 117–128, 2018.
[86] A. Gabbay, A. Shamir, and S. Peleg, “Visual speech enhancement,” in Proc. IN
TERSPEECH, pp. 1170–1174, 2018.
[87] D. Michelsanti, Z.H. Tan, S. Sigurdsson, and J. Jensen, “Effects of lombard re
flex on the performance of deeplearningbased audiovisual speech enhancement systems,” in Proc. ICASSP, pp. 6615–6619, 2019.
[88] A. Ephrat, I. Mosseri, O. Lang, T. Dekel, K. Wilson, A. Hassidim, W. T. Freeman, and M. Rubinstein, “Looking to listen at the cocktail party: A speakerindependent audiovisual model for speech separation,” arXiv preprint arXiv:1804.03619, 2018.
[89] K. Hwang and W. Sung, “Fixedpoint feedforward deep neural network design us
ing weights+ 1, 0, and 1,” in Proc. SiPS, pp. 1–6, 2014.
[90] R. Prabhavalkar, O. Alsharif, A. Bruguier, and L. McGraw, “On the compression of recurrent neural networks with an application to LVCSR acoustic modeling for embedded speech recognition,” in Proc. ICASSP, pp. 5970–5974, 2016.
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
[91] Y.T. Hsu, Y.C. Lin, S.W. Fu, Y. Tsao, and T.W. Kuo, “A study on speech en
hancement using exponentonly floating point quantized neural network (EOFP
QNN),” in Proc. SLT, pp. 566–573, 2018.
[92] R. Livni, S. ShalevShwartz, and O. Shamir, “On the computational efficiency of training neural networks,” in Proc. NIPS, pp. 855–863, 2014.
[93] L. Perez and J. Wang, “The effectiveness of data augmentation in image classifica
tion using deep learning,” arXiv preprint arXiv:1712.04621, 2017.
[94] T. Hussain, S. M. Siniscalchi, C.C. Lee, S.S. Wang, Y. Tsao, and W.H. Liao,
“Experimental study on extreme learning machine applications for speech enhance
ment,” IEEE Access, vol. 5, pp. 25542–25554, 2017.
[95] G.B. Huang, Q.Y. Zhu, and C.K. Siew, “Extreme learning machine: theory and applications,” Neurocomputing, vol. 70, no. 1, pp. 489–501, 2006.
[96] Z. Huang, Y. Yu, J. Gu, and H. Liu, “An efficient method for traffic sign recognition based on extreme learning machine,” IEEE Trans. Cybern., vol. 47, no. 4, pp. 920–
933, 2017.
[97] J. Tang, C. Deng, and G.B. Huang, “Extreme learning machine for multilayer perceptron,” IEEE transactions on neural networks and learning systems, vol. 27, no. 4, pp. 809–821, 2016.
[98] F. Sun, C. Liu, W. Huang, and J. Zhang, “Object classification and grasp planning using visual and tactile sensing,” IEEE Transactions on Systems, Man, and Cyber
netics: Systems, vol. 46, no. 7, pp. 969–979, 2016.
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
[99] L. L. C. Kasun, H. Zhou, G.B. Huang, and C. M. Vong, “Representational learning with ELMs for big data,” 2013.
[100] W. Zhao, T. H. Beach, and Y. Rezgui, “Optimization of potable water distribution and wastewater collection networks: A systematic review and future research di
rections,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 46, no. 5, pp. 659–681, 2016.
[101] D. Wang, L. Bischof, R. Lagerstrom, V. Hilsenstein, A. Hornabrook, and G. Hornabrook, “Automated opal grading by imaging and statistical learning,”
IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 46, no. 2, pp. 185–201, 2016.
[102] N. Wang, M. J. Er, and M. Han, “Parsimonious extreme learning machine using recursive orthogonal least squares,” IEEE Trans. Neural Netw. Learn. Syst., vol. 25, no. 10, pp. 1828–1841, 2014.
[103] D. Liu, Q. Wei, and P. Yan, “Generalized policy iteration adaptive dynamic pro
gramming for discretetime nonlinear systems,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 45, no. 12, pp. 1577–1591, 2015.
[104] G.B. Huang, H. Zhou, X. Ding, and R. Zhang, “Extreme learning machine for regression and multiclass classification,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 42, no. 2, pp. 513–529, 2012.
[105] D. Pearce and J. Picone, “Aurora working group: Dsr front end lvcsr evaluation au/384/02,” Inst. for Signal & Inform. Process., Mississippi State Univ., Tech. Rep, 2002.
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
[106] D. D. Lee and H. S. Seung, “Algorithms for nonnegative matrix factorization,” in Advances in neural information processing systems, pp. 556–562, 2001.
[107] L. Finesso and P. Spreij, “Nonnegative matrix factorization and idivergence al
ternating minimization,” Linear Algebra and its Applications, vol. 416, no. 23, pp. 270–287, 2006.
[108] X. Lu, Y. Tsao, S. Matsuda, and C. Hori, “Speech enhancement based on deep denoising autoencoder.,” in Proc. INTERSPEECH, pp. 436–440, 2013.
[109] J. Martens, “Deep learning via hessianfree optimization,” in Proc. ICML, pp. 735–
742, 2010.
[110] G.B. Huang, L. Chen, C. K. Siew, et al., “Universal approximation using in
cremental constructive feedforward networks with random hidden nodes,” IEEE Trans. Neural Networks, vol. 17, no. 4, pp. 879–892, 2006.
[111] A. Beck and M. Teboulle, “A fast iterative shrinkagethresholding algorithm for linear inverse problems,” SIAM journal on imaging sciences, vol. 2, no. 1, pp. 183–
202, 2009.
[112] I.T. Recommendation, “Perceptual evaluation of speech quality (PESQ): An ob
jective method for endtoend speech quality assessment of narrowband telephone networks and speech codecs,” Rec. ITUT P. 862, 2001.
[113] S. Quackenbush, T. Barnwell, and M. Clements, “Objective measures of speech
[113] S. Quackenbush, T. Barnwell, and M. Clements, “Objective measures of speech