• 沒有找到結果。

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

works is assessed using several objective and subjective evaluation metrics, namely PESQ, STOI, SDI, SSNRI, Cep, LLR, SRMR, FeSSNR, HASPI, and MUSHRA, by consider­

ing matched and mismatch testing conditions under different SNR and RIR levels. Such evaluation metrics are directly correlated with the speech quality, intelligibility, and per­

ceived amount of noise/reverberation of the estimated speech signals. The performance is compared against state­of­the­art deep neural­ and conventional speech signal processing­

based algorithms to verify their effectiveness and robustness. The proposed frameworks performed exceptionally well and exhibited better performance by resulting in less dis­

continues signal at the output. While working with deep neural structures, it is required to find a sufficient (generally large) amount of training data to generalize well and capture the relationship among the diverse features. However, HELM frameworks attempt to learn spectral mapping using a limited amount of training data and offer better universal approxi­

mation capability, which can effectively resolve the data requirement problem. Moreover, the fixed characteristic and tight temporal requirements enable HELM­based frameworks to land in the hardware implementation arena to obtain efficient classification/regression ability, usually implementing the forward phase. The results further demonstrate the great potential and applicability of HELM­based solutions in real­time situations where the data arrives in a sequential stream and exhibit dynamically changing and non­stationary envi­

ronments which can also be very useful.

7.2 Future Work

The experimental findings in this thesis have demonstrated the great potential of HELM­

based solutions for speech signal processing when there is a relatively limited amount of training data available. We will continue to leverage the characteristic of the HELM in

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

our future research and explore the applicability in model adaptation and self­supervised learning scenarios under extreme conditions. Multi­task learning and transfer learning ap­

proaches have recently been adopted to improve the performances of deep learning mod­

els. Moving forward, we will adopt these two approaches in a new research study, to investigate the compatibility of the proposed HELM frameworks and achieve further im­

provements in the performance. Moreover, we intend to propose HELM­based noise­ and SNR­aware training criteria to effectively enhance noise and reverberation suppression ca­

pabilities. Apart from the satisfactory performance achieved by the proposed audio­only frameworks for speech signal processing, the multimodal frameworks explicitly demon­

strated better behavior by relying more on the visual information and gained some guid­

ance while handling low SNRs and making a decision. Based on these findings, this may be another interesting direction in which a reinforcement learning technique can be used to gain more insight into the behavior of the multimodal system under low SNR and mismatch noise conditions. Subsequently, model compression strategy alongside resid­

ual and highway connections can be used to enrich the information for spectral mapping and further reduce computational costs. We will also examine the applicability of atten­

tion mechanisms for the HELM structures to focus more on the essential frames in noisy and reverberant signals. In our future work, we will also be trying to concentrate on the situation where more training data is available.

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

Bibliography

[1] J. Benesty, S. Makino, and J. Chen, Speech Enhancement. New York, USA:

Springer, 2005.

[2] J. Li, L. Deng, R. Haeb­Umbach, and Y. Gong, Robust Automatic Speech Recog­

nition: A Bridge to Practical Applications. Academic Press, 2015.

[3] B. Li, Y. Tsao, and K. C. Sim, “An investigation of spectral restoration algorithms for deep neural networks based noise robust speech recognition.,” in Proc. INTER­

SPEECH, pp. 3002–3006, 2013.

[4] A. El­Solh, A. Cuhadar, and R. Goubran, “Evaluation of speech enhancement tech­

niques for speaker identification in noisy environments,” in Proc. ISMW, pp. 235–

239, IEEE, 2007.

[5] J. Li, L. Yang, J. Zhang, Y. Yan, Y. Hu, M. Akagi, and P. C. Loizou, “Compar­

ative intelligibility investigation of single­channel noise­reduction algorithms for chinese, japanese, and english,” J. Acoust. Soc. Am., vol. 129, no. 5, pp. 3291–

3301, 2011.

[6] F. Yan, A. Men, B. Yang, and Z. Jiang, “An improved ranking­based feature enhancement approach for robust speaker recognition,” IEEE Access, vol. 4, pp. 5258–5267, 2016.

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

[7] J. Li, S. Sakamoto, S. Hongo, M. Akagi, and Y. Suzuki, “Two­stage binaural speech enhancement with wiener filter for high­quality speech communication,” Speech Commun., vol. 53, no. 5, pp. 677–689, 2011.

[8] T. Venema, Compression for clinicians. Delmar Pub, 2006.

[9] H. Levit, “Phd, noise reduction in hearing aids: An overview,” J. Rehabil. Res. Dev.

[10] Y.­H. Lai, F. Chen, S.­S. Wang, X. Lu, Y. Tsao, and C.­H. Lee, “A deep denoising autoencoder approach to improving the intelligibility of vocoded speech in cochlear implant simulation,” IEEE Trans. Biomed. Eng., vol. 64, no. 7, pp. 1568–1578, 2017.

[11] F. Chen, Y. Hu, and M. Yuan, “Evaluation of noise reduction methods for sentence recognition by mandarin­speaking cochlear implant listeners,” Ear and hearing, vol. 36, no. 1, pp. 61–71, 2015.

[12] P. Scalart et al., “Speech enhancement based on a priori signal to noise estimation,”

in Proc. ICASSP, vol. 2, pp. 629–632, IEEE, 1996.

[13] E. Hänsler and G. Schmidt, Topics in acoustic echo and noise control: selected methods for the cancellation of acoustical echoes, the reduction of background noise, and speech processing. Springer Science & Business Media, 2006.

[14] J. Chen, J. Benesty, Y. Huang, and E. Diethorn, “Fundamentals of noise reduction in spring handbook of speech processing,” Springer, 2008.

[15] R. McAulay and T. Quatieri, “Speech analysis/synthesis based on a sinusoidal rep­

resentation,” IEEE Trans. Acoust., Speech, Signal Process., vol. 34, no. 4, pp. 744–

754, 1986.

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

[16] T. F. Quatieri and R. J. McAulay, “Shape invariant time­scale and pitch modification of speech,” IEEE Trans. Signal Process., vol. 40, no. 3, pp. 497–510, 1992.

[17] J. Makhoul, “Linear prediction: A tutorial review,” Proceedings of the IEEE, vol. 63, no. 4, pp. 561–580, 1975.

[18] S. Suhadi, C. Last, and T. Fingscheidt, “A data­driven approach to a priori SNR es­

timation,” IEEE Trans. Audio, Speech, Language Process., vol. 19, no. 1, pp. 186–

195, 2011.

[19] T. Lotter and P. Vary, “Speech enhancement by map spectral amplitude estima­

tion using a super­gaussian speech model,” EURASIP Journal on Applied Signal Processing, vol. 2005, pp. 1110–1126, 2005.

[20] U. Kjems and J. Jensen, “Maximum likelihood based noise covariance matrix esti­

mation for multi­microphone speech enhancement,” in Proc. EUSIPCO, pp. 295–

299, IEEE, 2012.

[21] R. McAulay and M. Malpass, “Speech enhancement using a soft­decision noise suppression filter,” IEEE Trans. Acoust., Speech, Signal Process., vol. 28, no. 2, pp. 137–145, 1980.

[22] Y.­C. Su, Y. Tsao, J.­E. Wu, and F.­R. Jean, “Speech enhancement using generalized maximum a posteriori spectral amplitude estimator,” in Proc. ICASSP, pp. 7467–

7471, IEEE, 2013.

[23] R. Frazier, S. Samsam, L. Braida, and A. Oppenheim, “Enhancement of speech by adaptive filtering,” in Proc. ICASSP, vol. 1, pp. 251–253, IEEE, 1976.

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

[24] Y. Ephraim, “Statistical­model­based speech enhancement systems,” Proceedings of the IEEE, vol. 80, no. 10, pp. 1526–1555, 1992.

[25] B. Atal and M. Schroeder, “Predictive coding of speech signals and subjective error criteria,” IEEE Trans. Acoust., Speech, Signal Process., vol. 27, no. 3, pp. 247–254, 1979.

[26] L. Rabiner and B. Juang, “An introduction to hidden markov models,” ieee assp magazine, vol. 3, no. 1, pp. 4–16, 1986.

[27] C.­T. Lin, “Single­channel speech enhancement in variable noise­level environ­

ment,” IEEE Transactions on Systems, Man, and Cybernetics­Part A: Systems and Humans, vol. 33, no. 1, pp. 137–143, 2003.

[28] C. F. Stallmann and A. P. Engelbrecht, “Gramophone noise detection and recon­

struction using time delay artificial neural networks,” IEEE Transactions on Sys­

tems, Man, and Cybernetics: Systems, vol. 47, no. 6, pp. 893–905, 2017.

[29] J. Tchorz and B. Kollmeier, “SNR estimation based on amplitude modulation anal­

ysis with applications to noise suppression,” IEEE Transactions on Speech and Audio Processing, vol. 11, no. 3, pp. 184–192, 2003.

[30] S. Tamura, “An analysis of a noise reduction neural network,” in Proc. ICASSP, pp. 2001–2004, IEEE, 1989.

[31] F. Xie and D. Van Compernolle, “A family of mlp based nonlinear spectral estima­

tors for noise reduction,” in Proc. ICASSP, vol. 2, pp. II–53, IEEE, 1994.

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

[32] E. A. Wan and A. T. Nelson, “Networks for speech enhancement,” Handbook of neural networks for speech processing. Artech House, Boston, USA, vol. 139, p. 1, 1999.

[33] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” science, vol. 313, no. 5786, pp. 504–507, 2006.

[34] D. Erhan, Y. Bengio, A. Courville, P.­A. Manzagol, P. Vincent, and S. Bengio,

“Why does unsupervised pre­training help deep learning?,” Journal of Machine Learning Research, vol. 11, no. Feb, pp. 625–660, 2010.

[35] B. Xia and C. Bao, “Speech enhancement with weighted denoising auto­encoder.,”

in Proc. INTERSPEECH, pp. 3444–3448, 2013.

[36] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.­A. Manzagol, “Stacked de­

noising autoencoders: Learning useful representations in a deep network with a lo­

cal denoising criterion,” Journal of Machine Learning Research, vol. 11, no. Dec, pp. 3371–3408, 2010.

[37] A. L. Maas, Q. V. Le, T. M. O’Neil, O. Vinyals, P. Nguyen, and A. Y. Ng, “Recurrent neural networks for noise reduction in robust asr,” in Proc. ICASSP, 2012.

[38] M. Wöllmer, Z. Zhang, F. Weninger, B. Schuller, and G. Rigoll, “Feature enhance­

ment by bidirectional lstm networks for conversational speech recognition in highly non­stationary noise,” in Proc. ICASSP, pp. 6822–6826, IEEE, 2013.

[39] M. Z. Uddin, M. M. Hassan, A. Almogren, A. Alamri, M. Alrubaian, and G. Fortino, “Facial expression recognition utilizing local direction­based robust features and deep belief network,” IEEE Access, vol. 5, pp. 4525–4536, 2017.

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

[40] S. W. Akhtar, S. Rehman, M. Akhtar, M. A. Khan, F. Riaz, Q. Chaudry, and R. Young, “Improving the robustness of neural networks using k­support norm based adversarial training,” IEEE Access, vol. 4, pp. 9501–9511, 2016.

[41] S.­W. Fu, Y. Tsao, X. Lu, and H. Kawai, “Raw waveform­based speech enhance­

ment by fully convolutional networks,” in Proc. APSIPA, pp. 6–12, 2017.

[42] Y. Kim and B. Toomajian, “Hand gesture recognition using micro­doppler signa­

tures with convolutional neural network,” IEEE Access, vol. 4, pp. 7125–7130, 2016.

[43] Y. Wang and D. Wang, “Towards scaling up classification­based speech separa­

tion,” IEEE Trans. Audio, Speech, Language Process., vol. 21, no. 7, pp. 1381–

1390, 2013.

[44] N. Wang, M. J. Er, and M. Han, “Generalized single­hidden layer feedforward networks for regression problems,” IEEE Transactions on Neural Networks and Learning Systems, vol. 26, no. 6, pp. 1161–1176, 2015.

[45] Y. Xu, J. Du, L.­R. Dai, and C.­H. Lee, “A regression approach to speech enhance­

ment based on deep neural networks,” IEEE/ACM Trans. Audio, Speech, Language Process., vol. 23, no. 1, pp. 7–19, 2015.

[46] X. Feng, Y. Zhang, and J. Glass, “Speech feature denoising and dereverberation via deep autoencoders for noisy reverberant speech recognition,” in Proc. ICASSP, pp. 1759–1763, 2014.

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

[47] J. Li, L. Deng, Y. Gong, and R. Haeb­Umbach, “An overview of noise­robust auto­

matic speech recognition,” IEEE/ACM Trans. Audio, Speech, Language Process., vol. 22, no. 4, pp. 745–777, 2014.

[48] S. M. Siniscalchi and V. M. Salerno, “Adaptation to new microphones using ar­

tificial neural networks with trainable activation functions,” IEEE Trans. Neural Netw. Learn. Syst., vol. 28, no. 8, pp. 1959–1965, 2017.

[49] Q. Jin, T. Schultz, and A. Waibel, “Far­field speaker recognition,” IEEE Audio, Speech, and Language Process., vol. 15, no. 7, pp. 2023–2032, 2007.

[50] X. Zhao, Y. Wang, and D. Wang, “Robust speaker identification in noisy and re­

verberant conditions,” IEEE/ACM Trans. Audio, Speech and Language Process., vol. 22, no. 4, pp. 836–845, 2014.

[51] S. O. Sadjadi and J. H. Hansen, “Hilbert envelope based features for robust speaker identification under reverberant mismatched conditions,” in Proc. ICASSP, pp. 5448–5451, 2011.

[52] K. Kokkinakis, O. Hazrati, and P. C. Loizou, “A channel­selection criterion for suppressing reverberation in cochlear implants,” J. Acoust. Soc. Am., vol. 129, no. 5, pp. 3221–3232, 2011.

[53] O. Hazrati, S. Omid Sadjadi, P. C. Loizou, and J. H. Hansen, “Simultaneous sup­

pression of noise and reverberation in cochlear implants using a ratio masking strat­

egy,” The Journal of the Acoustical Society of America, vol. 134, no. 5, pp. 3759–

3765, 2013.

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

[54] J. Benesty, M. M. Sondhi, and Y. Huang, Springer handbook of speech processing.

Springer, 2007.

[55] B. W. Gillespie, H. S. Malvar, and D. A. Florêncio, “Speech dereverberation via maximum­kurtosis subband adaptive filtering,” in Proc. ICASSP, vol. 6, pp. 3701–

3704, 2001.

[56] K. Kinoshita, M. Delcroix, T. Nakatani, and M. Miyoshi, “Suppression of late re­

verberation effect on speech signal using long­term multiple­step linear prediction,”

IEEE Trans. Audio, Speech, Language Process., vol. 17, no. 4, pp. 534–545, 2009.

[57] T. Nakatani, T. Yoshioka, K. Kinoshita, M. Miyoshi, and B.­H. Juang, “Speech dereverberation based on variance­normalized delayed linear prediction,” IEEE Trans. Audio, Speech, Language Process., vol. 18, no. 7, pp. 1717–1731, 2010.

[58] T. Nakatani, M. Miyoshi, and K. Kinoshita, “Single­microphone blind dereverber­

ation,” in Speech Enhancement, pp. 247–270, Springer, 2005.

[59] H. Attias, J. C. Platt, A. Acero, and L. Deng, “Speech denoising and dereverberation using probabilistic models,” in Proc. NIPS, pp. 758–764, 2001.

[60] J.­T. Chien and Y.­C. Chang, “Bayesian learning for speech dereverberation,” in Proc. MLSP, pp. 1–6, 2016.

[61] D. Bees, M. Blostein, and P. Kabal, “Reverberant speech enhancement using cep­

stral processing,” in Proc. ICASSP, pp. 977–980, 1991.

[62] K. Lebart, J.­M. Boucher, and P. Denbigh, “A new method based on spectral sub­

traction for speech dereverberation,” Acta Acoustica, vol. 87, no. 3, pp. 359–366, 2001.

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

[63] M. Miyoshi and Y. Kaneda, “Inverse filtering of room acoustics,” IEEE Trans.

Acoust., Speech, Signal Process., vol. 36, no. 2, pp. 145–152, 1988.

[64] J. Flanagan, J. Johnston, R. Zahn, and G. Elko, “Computer­steered microphone arrays for sound transduction in large rooms,” J. Acoust. Soc. Am., vol. 78, no. 5, pp. 1508–1518, 1985.

[65] J. L. Flanagan, A. C. Surendran, and E.­E. Jan, “Spatially selective sound capture for speech and audio processing,” Speech Commun., vol. 13, no. 1­2, pp. 207–222, 1993.

[66] T. J. Cox, F. Li, and P. Darlington, “Extracting room reverberation time from speech using artificial neural networks,” Jour. Audio Eng. Soc., vol. 49, no. 4, pp. 219–230, 2001.

[67] J. Qi, J. Du, S. M. Siniscalchi, and C.­H. Lee, “A theory on deep neural net­

work based vector­to­vector regression with an illustration of its expressive power in speech enhancement,” IEEE/ACM Trans. Audio, Speech, Language Process., vol. 27, no. 12, pp. 1932–1943, 2019.

[68] T. Ishii, H. Komiyama, T. Shinozaki, Y. Horiuchi, and S. Kuroiwa, “Reverberant speech recognition based on denoising autoencoder.,” in Proc. INTERSPEECH, pp. 3512–3516, 2013.

[69] Z. Zhang, J. Pinto, C. Plahl, B. Schuller, and D. Willett, “Channel mapping us­

ing bidirectional long short­term memory for dereverberation in hands­free voice controlled devices,” IEEE Trans. Consum. Electron., vol. 60, no. 3, pp. 525–533, 2014.

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

[70] F. Weninger, S. Watanabe, J. Le Roux, J. Hershey, Y. Tachioka, J. Geiger, B. Schuller, and G. Rigoll, “The merl/melco/tum system for the reverb challenge using deep recurrent neural network feature enhancement,” in Proc. REVERB Work­

shop, 2014.

[71] A. Schwarz, C. Huemmer, R. Maas, and W. Kellermann, “Spatial diffuseness fea­

tures for dnn­based speech recognition in noisy and reverberant environments,” in Proc. ICASSP, pp. 4380–4384, 2015.

[72] K. Han, Y. Wang, D. Wang, W. S. Woods, I. Merks, and T. Zhang, “Learning spec­

tral mapping for speech dereverberation and denoising,” IEEE/ACM Trans. Audio, Speech, Language Process., vol. 23, no. 6, pp. 982–992, 2015.

[73] X. Xiao, S. Zhao, D. H. H. Nguyen, X. Zhong, D. L. Jones, E. S. Chng, and H. Li,

“Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation,” EURASIP J. Adv. Signal Process., vol. 2016, no. 1, p. 4, 2016.

[74] B. Wu, K. Li, M. Yang, and C.­H. Lee, “A reverberation­time­aware approach to speech dereverberation based on deep neural networks,” IEEE/ACM Trans. Audio, Speech, Language Process., vol. 25, no. 1, pp. 102–111, 2017.

[75] D. S. Williamson and D. Wang, “Time­frequency masking in the complex domain for speech dereverberation and denoising,” IEEE/ACM Trans. Audio, Speech, Lan­

guage Process., vol. 25, no. 7, pp. 1492–1501, 2017.

[76] T. Shinamura and T. Tomikura, “Quality improvement of bone­conducted speech,”

in Proc. ECCTD, vol. 3, pp. III–73, IEEE, 2005.

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

[77] Z. Zhang, Z. Liu, M. Sinclair, A. Acero, L. Deng, J. Droppo, X. Huang, and Y. Zheng, “Multi­sensory microphones for robust speech detection, enhancement and recognition,” in Proc. ICASSP, vol. 3, pp. iii–781, IEEE, 2004.

[78] Y. Zheng, Z. Liu, Z. Zhang, M. Sinclair, J. Droppo, L. Deng, A. Acero, and X. Huang, “Air­and bone­conductive integrated microphones for robust speech de­

tection and enhancement,” in 2003 IEEE Workshop on Automatic Speech Recogni­

tion and Understanding (IEEE Cat. No. 03EX721), pp. 249–254, IEEE, 2003.

[79] M. Graciarena, H. Franco, K. Sonmez, and H. Bratt, “Combining standard and throat microphones for robust speech recognition,” IEEE Signal Processing Letters, vol. 10, no. 3, pp. 72–74, 2003.

[80] T. V. Thang, K. Kimura, M. Unoki, and M. Akagi, “A study on restoration of bone­

conducted speech with mtf­based and lp­based models,” Journal of signal process­

ing, 2006.

[81] Y. Tajiri, H. Kameoka, and T. Toda, “A noise suppression method for body­

conducted soft speech based on non­negative tensor factorization of air­and body­

conducted signals,” in Proc. ICASSP, pp. 4960–4964, IEEE, 2017.

[82] J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, and A. Y. Ng, “Multimodal deep learning,” in Proc. ICML, pp. 689–696, 2011.

[83] Y. Mroueh, E. Marcheret, and V. Goel, “Deep multimodal learning for audio­visual speech recognition,” in Proc. ICASSP, pp. 2130–2134, 2015.

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

[84] S. Tamura, H. Ninomiya, N. Kitaoka, S. Osuga, Y. Iribe, K. Takeda, and S. Hayamizu, “Audio­visual speech recognition using deep bottleneck features and high­performance lipreading,” in Proc. APSIPA, pp. 575–582, 2015.

[85] J.­C. Hou, S.­S. Wang, Y.­H. Lai, Y. Tsao, H.­W. Chang, and H.­M. Wang, “Audio­

visual speech enhancement using multimodal deep convolutional neural networks,”

IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 2, no. 2, pp. 117–128, 2018.

[86] A. Gabbay, A. Shamir, and S. Peleg, “Visual speech enhancement,” in Proc. IN­

TERSPEECH, pp. 1170–1174, 2018.

[87] D. Michelsanti, Z.­H. Tan, S. Sigurdsson, and J. Jensen, “Effects of lombard re­

flex on the performance of deep­learning­based audio­visual speech enhancement systems,” in Proc. ICASSP, pp. 6615–6619, 2019.

[88] A. Ephrat, I. Mosseri, O. Lang, T. Dekel, K. Wilson, A. Hassidim, W. T. Freeman, and M. Rubinstein, “Looking to listen at the cocktail party: A speaker­independent audio­visual model for speech separation,” arXiv preprint arXiv:1804.03619, 2018.

[89] K. Hwang and W. Sung, “Fixed­point feedforward deep neural network design us­

ing weights+ 1, 0, and­ 1,” in Proc. SiPS, pp. 1–6, 2014.

[90] R. Prabhavalkar, O. Alsharif, A. Bruguier, and L. McGraw, “On the compression of recurrent neural networks with an application to LVCSR acoustic modeling for embedded speech recognition,” in Proc. ICASSP, pp. 5970–5974, 2016.

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

[91] Y.­T. Hsu, Y.­C. Lin, S.­W. Fu, Y. Tsao, and T.­W. Kuo, “A study on speech en­

hancement using exponent­only floating point quantized neural network (EOFP­

QNN),” in Proc. SLT, pp. 566–573, 2018.

[92] R. Livni, S. Shalev­Shwartz, and O. Shamir, “On the computational efficiency of training neural networks,” in Proc. NIPS, pp. 855–863, 2014.

[93] L. Perez and J. Wang, “The effectiveness of data augmentation in image classifica­

tion using deep learning,” arXiv preprint arXiv:1712.04621, 2017.

[94] T. Hussain, S. M. Siniscalchi, C.­C. Lee, S.­S. Wang, Y. Tsao, and W.­H. Liao,

“Experimental study on extreme learning machine applications for speech enhance­

ment,” IEEE Access, vol. 5, pp. 25542–25554, 2017.

[95] G.­B. Huang, Q.­Y. Zhu, and C.­K. Siew, “Extreme learning machine: theory and applications,” Neurocomputing, vol. 70, no. 1, pp. 489–501, 2006.

[96] Z. Huang, Y. Yu, J. Gu, and H. Liu, “An efficient method for traffic sign recognition based on extreme learning machine,” IEEE Trans. Cybern., vol. 47, no. 4, pp. 920–

933, 2017.

[97] J. Tang, C. Deng, and G.­B. Huang, “Extreme learning machine for multilayer perceptron,” IEEE transactions on neural networks and learning systems, vol. 27, no. 4, pp. 809–821, 2016.

[98] F. Sun, C. Liu, W. Huang, and J. Zhang, “Object classification and grasp planning using visual and tactile sensing,” IEEE Transactions on Systems, Man, and Cyber­

netics: Systems, vol. 46, no. 7, pp. 969–979, 2016.

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

[99] L. L. C. Kasun, H. Zhou, G.­B. Huang, and C. M. Vong, “Representational learning with ELMs for big data,” 2013.

[100] W. Zhao, T. H. Beach, and Y. Rezgui, “Optimization of potable water distribution and wastewater collection networks: A systematic review and future research di­

rections,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 46, no. 5, pp. 659–681, 2016.

[101] D. Wang, L. Bischof, R. Lagerstrom, V. Hilsenstein, A. Hornabrook, and G. Hornabrook, “Automated opal grading by imaging and statistical learning,”

IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 46, no. 2, pp. 185–201, 2016.

[102] N. Wang, M. J. Er, and M. Han, “Parsimonious extreme learning machine using recursive orthogonal least squares,” IEEE Trans. Neural Netw. Learn. Syst., vol. 25, no. 10, pp. 1828–1841, 2014.

[103] D. Liu, Q. Wei, and P. Yan, “Generalized policy iteration adaptive dynamic pro­

gramming for discrete­time nonlinear systems,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 45, no. 12, pp. 1577–1591, 2015.

[104] G.­B. Huang, H. Zhou, X. Ding, and R. Zhang, “Extreme learning machine for regression and multiclass classification,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 42, no. 2, pp. 513–529, 2012.

[105] D. Pearce and J. Picone, “Aurora working group: Dsr front end lvcsr evaluation au/384/02,” Inst. for Signal & Inform. Process., Mississippi State Univ., Tech. Rep, 2002.

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

[106] D. D. Lee and H. S. Seung, “Algorithms for non­negative matrix factorization,” in Advances in neural information processing systems, pp. 556–562, 2001.

[107] L. Finesso and P. Spreij, “Nonnegative matrix factorization and i­divergence al­

ternating minimization,” Linear Algebra and its Applications, vol. 416, no. 2­3, pp. 270–287, 2006.

[108] X. Lu, Y. Tsao, S. Matsuda, and C. Hori, “Speech enhancement based on deep denoising autoencoder.,” in Proc. INTERSPEECH, pp. 436–440, 2013.

[109] J. Martens, “Deep learning via hessian­free optimization,” in Proc. ICML, pp. 735–

742, 2010.

[110] G.­B. Huang, L. Chen, C. K. Siew, et al., “Universal approximation using in­

cremental constructive feedforward networks with random hidden nodes,” IEEE Trans. Neural Networks, vol. 17, no. 4, pp. 879–892, 2006.

[111] A. Beck and M. Teboulle, “A fast iterative shrinkage­thresholding algorithm for linear inverse problems,” SIAM journal on imaging sciences, vol. 2, no. 1, pp. 183–

202, 2009.

[112] I.­T. Recommendation, “Perceptual evaluation of speech quality (PESQ): An ob­

jective method for end­to­end speech quality assessment of narrow­band telephone networks and speech codecs,” Rec. ITU­T P. 862, 2001.

[113] S. Quackenbush, T. Barnwell, and M. Clements, “Objective measures of speech

[113] S. Quackenbush, T. Barnwell, and M. Clements, “Objective measures of speech