參考文獻 - 改善類神經網路聲學模型經由結合多任務學習與整體學習於會議語音辨識之研究

[1] S. J. Young and P. C. Woodland, “The use of state tying in continuous speech recog-nition,” in Proceedings of the European Conference on Speech Communication and Technology, 1993.

[2] V. Valtchev, J. J. Odell, P. C. Woodland, and S. Young, “Lattice-based discrim-inative training for large vocabulary speech recognition,” in Proceedings of the International Conference on Acoustics Speech and Signal Processing, 1996.

[3] P. C. Woodland and D. Povey, “Large scale discriminative training of hidden Markov models for speech recognition,” Computer Speech and Language, vol. 16, no. 1, pp. 25–47, 2002.

[4] D. Povey, “Discriminative training for large vocabulary speech recognition,” Ph.D.

dissertation, University of Cambridge, 2004.

[5] M. J. F. Gales, “Maximum likelihood linear transformations for HMM-based speech recognition,” Computer Speech and Language, vol. 12, no. 2, pp. 75–98, 1998.

[6] G. Ye, B. Mak, and M. W. Mak, “Fast GMM computation for speaker verification using scalar quantization and discrete densities,” in Proceedings of the International Conference on Speech Communication and Technology, 2009.

[7] E. Trentin and M. Gori, “A survey of hybrid ANN/HMM models for automatic speech recognition,” Neurocomputing, vol. 37, no. 1, pp. 91–126, 2001.

[8] M. Ostendorf, V. V. Digalakis, and O. Kimball, “A unified view of stochastic mod-eling for speech recognition,” IEEE Transactions on Speech and Audio Processing, vol. 4, no. 5, pp. 360–378, 1996.

[9] G. Zweig and P. Nguyen, “A segmental conditional random field toolkit for speech recognition,” in Proceedings of the International Conference on Speech Communication and Technology, 2010.

[10] M. Mohri, F. Pereira, and M. Riley, “Weighted finite-state transducers in speech recognition,” Computer Speech and Language, vol. 16, no. 1, pp. 69–88, 2002.

[11] D. Yu and L. Deng, Automatic Speech Recognition: A Deep Learning Approach.

Springer, 2014.

[12] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov,

“Dropout: A simple way to prevent neural networks from overfitting,” The Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014.

[13] T. Sercu, C. Puhrsch, B. Kingsbury, and Y. LeCun, “Very deep multilingual convolutional neural networks for LVCSR,” in Proceedings of the International Conference on Acoustics Speech and Signal Processing, 2016.

[14] A. Graves, A. R. Mohamed, and G. Hinton, “Speech recognition with deep recur-rent neural networks,” in Proceedings of the International Conference on Acoustics Speech and Signal Processing, 2013.

[15] J. Li, A. Mohamed, G. Zweig, and Y. Gong, “Exploring multidimensional LSTMs for large vocabulary ASR,” in Proceedings of the International Conference on Acoustics Speech and Signal Processing, 2016.

[16] J. M. Benyus, Biomimicry: Innovation Inspired by Nature. William Morrow and Company, 1997.

[17] M. T. Hagan, H. B. Demuth, M. H. Beale, and O. D. Jesus, Neural network design.

PWS Publishing Company, 1996.

[18] D. J. Felleman and D. C. V. Essen, “Distributed hierarchical processing in the pri-mate cerebral cortex,” Cerebral Cortex, vol. 1, no. 1, pp. 1–47, 1991.

[19] R. Caruana, “Multitask learning,” Ph.D. dissertation, University of Carnegie Mel-lon, 1997.

[20] R. Collobert and J. Weston, “A unified architecture for natural language processing:

Deep neural networks with multitask learning,” in Proceedings of the International Conference on Machine Learning, 2008.

[21] G. Tur, “Multitask learning for spoken language understanding,” in Proceedings of the International Conference on Acoustics Speech and Signal Processing, 2006.

[22] Y. Huang, W. Wang, L. Wang, and T. Tan, “Multi-task deep neural network for multi-label learning,” in Proceedings of the International Conference on Image Processing, 2013.

[23] M. Seltzer and J. Droppo, “Multi-task learning in deep neural networks for im-proved phoneme recognition,” in Proceedings of the International Conference on Acoustics Speech and Signal Processing, 2013.

[24] D. Wolpert and W. Macready, “No free lunch theorems for optimization,” IEEE Transactions on Evolutionary Computation, no. 1, pp. 67–82, 1997.

[25] V. Vapnik, The Nature of Statistical Learning Theory. Springer, 2000.

[26] S. Furui, “Generalization problem in ASR acoustic model training and adaptation,”

in Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2009.

[27] B. S. Atal, “Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification,” Journal of the Acoustical Society of America, vol. 55, no. 6, pp. 1304–1312, 1974.

[28] O. Viikki and K. Laurila, “Cepstral domain segmental feature vector normalization for noise robust speech recognition,” Speech Communication, vol. 25, pp. 133–147, 1998.

[29] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern classification. New York: John

& Wiley, 2000.

[30] R. Rosenfeld, “A maximum entropy approach to adaptive statistical language mod-eling,” Computer Speech and Language, vol. 10, no. 2, pp. 187–228, 1996.

[31] J. W. Kuo and B. Chen, “Minimum word error based discriminative training of language models,” in Proceedings of the International Conference on Speech Communication and Technology, 2005.

[32] S. M. Katz, “Estimation of probabilities form sparse data for other language compo-nent of a speech recognizer,” IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 35, no. 5, pp. 300–401, 1987.

[33] H. Ney, U. Essen, and R. Kneser, “On structuring probabilistic dependences in stochastic language modeling,” Computer Speech and Language, vol. 8, pp. 1–38, 1994.

[34] A. J. Viterbi, “Error bounds for convolutional codes and an asymptotically optimum decoding algorithm,” IEEE Transactions on Information Theory, vol. 13, no. 2, 1967.

[35] S. Ortmanns, H. Ney, and X. Aubert, “A word graph algorithm for large vocabulary continuous speech recognition,” Computer Speech and Language, vol. 11, pp. 11–

72, 1997.

[36] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driess-che, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Diele-man, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, “Mastering the game of Go with deep neural networks and tree search,” Nature, vol. 529, pp. 484–503, 2016.

[37] G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Van-houcke, P. Nguyen, T. Sainath, and B. Kingsbury, “Deep neural networks for acous-tic modeling in speech recognition,” IEEE Transactions Signal Processing Maga-zine, vol. 29, pp. 82–97, 2012.

[38] L. Deng and X. Li, “Machine learning paradigms for speech recognition: An overview,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 5, pp. 1060–1089, 2013.

[39] E. A. Bryson and Y. C. Ho, Applied optimal control: Optimization, estimation, and control. Blaisdell Publishing Company, 1969.

[40] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” Nature, vol. 323, pp. 533–536, 1986.

[41] O. Abdel-Hamid, L. Deng, and D. Yu, “Exploring convolutional neural network structures and optimization techniques for speech recognition,” in Proceedings of the International Conference on Speech Communication and Technology, 2013.

[42] O. A. Hamid, A. R. Mohamed, H. Jiang, and G. Penn, “Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition,”

in Proceedings of the International Conference on Acoustics Speech and Signal Processing, 2012.

[43] K. Chellapilla, S. Puri, and P. Simard, “High performance convolutional neural networks for document processing,” in Proceedings of the International Conference on Frontiers in Handwriting Recognition, 2006.

[44] D. C. Ciresan, U. Meier, and J. Schmidhuber, “Transfer learning for Latin and Chinese characters with deep neural networks,” in Proceedings of the International Joint Conference on Neural Networks, 2012.

[45] L. Deng, O. A. Hamid, and D. Yu, “A deep convolutional neural network using heterogeneous pooling for trading acoustic invariance with phonetic confusion,”

in Proceedings of the International Conference on Acoustics Speech and Signal Processing, 2013.

[46] T. N. Sainath, B. Kingsbury, A. R. Mohamed, G. E. Dahl, G. Saon, H. Soltau, T. Beran, A. Y. Aravkin, and B. Ramabhadran, “Improvements to deep convolu-tional neural networks for LVCSR,” in Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2013.

[47] T. N. Sainath, A. R. Mohamed, B. Kingsbury, and B. Ramabhadran, “Deep convolutional neural networks for LVCSR,” in Proceedings of the International Conference on Acoustics Speech and Signal Processing, 2013.

[48] D. H. Hubel and T. N. Wiesel, “Receptive fields, binocular in teraction, and func-tional architecture in the cat’s visual cortex,” Journal of Physiology, vol. 160, pp.

106–154, 1962.

[49] D. Scherer, A. Muller, and S. Behnke, “Evaluation of pooling operations in convo-lutional architectures for object recognition,” in Proceedings of the International Conference on Artificial Neural Networks, 2010.

[50] T. Hastie, R. Tibshirani, and J. Friedman, The elements of statistical learning: Data mining, inference and prediction. Springer, 2009.

[51] V. N. Vapnik, Statistical Learning Theory. Wiley-Interscience, 1998.

[52] T. M. Mitchell, Machine Learning. McGraw-Hill, 1997.

[53] S. Thrun and L. Pratt, Learning to learn. Kluwer Academic Publishers, 1998.

[54] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Transactions on Knowledge and Data Engineering, vol. 22, pp. 1345–1359, 2010.

[55] H. C. Ellis, Transfer of Learning. The Macmillan Company, 1965.

[56] E. L. Thorndike and R. S. Woodworth, “The influence of improvement in one mental function upon the efficiency of the other functions,” Psychological Review, vol. 8, pp. 247–261, 1901.

[57] J. Q. Candela, M. Sugiyama, A. Schwaighofer, and N. D. Lawrence, Dataset Shift in Machine Learning. MIT Press, 2009.

[58] R. K. Gupta and S. D. Senturia, “Learning and evaluating classifiers under sam-ple selection bias,” in Proceedings of the International Conference on Machine Learning, 2004.

[59] J. Huang, A. Smola, A. Gretton, K. M. Borgwardt, and B. Scholkopf, “Correcting sample selection bias by unlabeled data,” Advances in Neural Information Process-ing Systems, vol. 19, pp. 601–608, 2007.

[60] B. Scholkopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, 2001.

[61] A. Gretton, K. Borgwardt, M. Rasch, B. Scholkopf, and A. Smola, “A kernel method for the two-sample problem,” Advances in Neural Information Processing Systems, vol. 19, pp. 513–520, 2007.

[62] M. Sugiyama, S. Nakajima, H. Kashima, P. V. Buenau, and M. Kawanabe, “Di-rect importance estimation with model selection and its application to covariate shift adaptation,” Advances in Neural Information Processing Systems, vol. 20, pp.

1433–1440, 2008.

[63] T. Kanamori, S. Hido, and M. Sugiyama, “A least-squares approach to direct impor-tance estimation,” Journal of Machine Learning Research, vol. 10, pp. 1391–1445, 2009.

[64] L. Duan, I. W. Tsang, D. Xu, and T.-S. Chua, “Domain adaptation from multiple sources via auxiliary classifiers,” in Proceedings of the International Conference on Machine Learning, 2009.

[65] C. Corinna and V. Vladimir, “Support-vector networks,” Machine Learning, vol. 20, no. 3, pp. 273–297, 1995.

[66] J. Jiang and C. Zhai, “Instance weighting for domain adaptation in NLP,” in Proceedings of the International Conference on Association for Computational Linguistics, 2007.

[67] W. Dai, Q. Yang, G. Xue, and Y. Yu, “Boosting for transfer learning,” in Proceedings of the International Conference on Machine Learning, 2007.

[68] Y. Freund and R. E. Schapire, “A decision-theoretic generalization of on-line learn-ing and an application to boostlearn-ing,” in Proceedlearn-ings of the European Conference on Computational Learning Theory, 1995.

[69] S. J. Pan, J. T. Kwok, and Q. Yang, “Transfer learning via dimensionality re-duction,” in Proceedings of the International Conference on Association for the Advancement of Artificial Intelligence, 2008.

[70] S. J. Pan, I. W. Tsang, J. T. Kwok, and Q. Yang, “Domain adaptation via transfer component analysis,” IEEE Transactions on Neural Networks, vol. 22, pp. 199–

210, 2011.

[71] X. Shi, W. F. ans Q. Yang, and J. Ren, “Relaxed transfer of different classes via spectral partition,” in Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery, 2009.

[72] L. Mihalkova, T. Huynh, and R. J. Mooney, “Mapping and revising Markov logic networks for transfer learning,” in Proceedings of the International Conference on Association for the Advancement of Artificial Intelligence, 2007.

[73] M. Richardson and P. Domingos, “Markov logic networks,” Machine Learning Journal, vol. 62, pp. 107–136, 2006.

[74] L. Mihalkova and R. J. Mooney, “Transfer learning by mapping with minimal tar-get data,” in Proceedings of the International Conference on Association for the Advancement of Artificial Intelligence, 2008.

[75] J. Davis and P. Domingos, “Deep transfer via second-order Markov logic,” in Proceedings of the International Conference on Machine Learning, 2009.

[76] F. Li, S. J. Pan, O. Jin, Q. Yang, and X. Zhu, “Cross-domain co-extraction of sentiment and topic lexicons,” in Proceedings of the International Conference on Association for Computational Linguistics, 2012.

[77] X. Ling, G. R. Xue, W. Dai, Y. Jiang, Q. Yang, and Y. Yu, “Can Chinese web pages be classified with English data source?” in Proceedings of the International Conference on World Wide Web, 2008.

[78] C. Wang and S. Mahadevan, “Heterogeneous domain adaptation using manifold alignment,” in Proceedings of the International Joint Conference on Artificial Intelligence, 2011.

[79] P. Prettenhofer and B. Stein, “Cross-language text classification using structural correspondence learning,” in Proceedings of the International Conference on Association for Computational Linguistics, 2010.

[80] G. Qi, C. C. Aggarwal, and T. S. Huang, “Towards semantic knowledge propagation from text corpus to web images,” in Proceedings of the International Conference on World Wide Web, 2011.

[81] Y. Chen, O. Jin, G. R. Xue, J. Chen, and Q. Yang, “Visual contextual advertising:

Bringing textual advertisements to images,” in Proceedings of the International Conference on Association for the Advancement of Artificial Intelligence, 2010.

[82] B. Kulis, K. Saenko, and T. Darrell, “What you saw is not what you get:

Domain adaptation using asymmetric kernel transforms,” in Proceedings of the International Conference on Computer Vision and Pattern Recognition, 2011.

[83] K. Saenko, B. Kulis, M. Fritz, and T. Darrell, “Adapting visual category models to new domains,” in Proceedings of the European Conference on Computer Vision, 2010.

[84] J.Baxter, “A model of inductive bias learning,” Journal of Artificial Intelligence Research, vol. 12, pp. 149–198, 2000.

[85] S. B. David and R. Schuller, “Exploiting task relatedness for multiple task learning,”

in Proceedings of the International Conference on Learning Theory, 2003.

[86] T. Kato, H. Kashima, M. Sugiyama, and K. Asai, “Multi-task learning via conic programming,” Advances in Neural Information Processing Systems, pp. 737–744, 2008.

[87] Y. Zhang and D. Yeung, “A convex formulation for learning task relationships in multi-task learning,” in Proceedings of the International Conference on Uncertainty in Artificial Intelligence, 2010.

[88] H. Fei and J. Huan, “Structured feature selection and task relationship inference for multi-task learning,” Knowledge and Information Systems, vol. 35, no. 2, pp.

345–364, 2013.

[89] S. Parveen and P. D. Green, “Multitask learning in connectionist ASR using re-current neural networks,” in Proceedings of the European Conference on Speech Communication and Technology, 2003, pp. 1813––1816.

[90] A. Ghoshal, P. Swietojanski, and S. Renals, “Multilingual training of deep neural networks,” in Proceedings of the International Conference on Speech Communication and Technology, 2013.

[91] J. T. Huang, J. Li, D. Yu, L. Deng, and Y. Gong, “Cross-language knowl-edge transfer using multilingual deep neural network with shared hidden layers,”

in Proceedings of the International Conference on Speech Communication and Technology, 2013.

[92] T. Schultz and A. Waibel, “Language-independent and language adaptive acoustic modeling for speech recognition,” Speech Communication, vol. 35, no. 1, pp. 31–

51, 2001.

[93] N. T. Vu, F. Kraus, and T. Schultz, “Cross-language bootstrapping based on com-pletely unsupervised training using multilingual A-stabil,” in Proceedings of the International Conference on Acoustics Speech and Signal Processing, 2011.

[94] P. Swietojanski, A. Ghoshal, and S. Renals, “Unsupervised crosslingual knowledge transfer in DNN-based LVCSR,” in Proceedings of the International Conference on Spoken Language Technology Workshop, 2012.

[95] C. Bucilua, R. Caruana, and A. N. Mizil, “Model compression,” in Proceedings of the International Conference on Knowledge Discovery and Data Mining, 2006.

[96] G. E. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural net-work,” arXiv:1503.02531, 2015.

[97] Z. Tang, D. Wang, and Z. Zhang, “Recurrent neural network training with dark knowledge transfer,” in Proceedings of the International Conference on Acoustics Speech and Signal Processing, 2016.

[98] G. Valentini and F. Masulli, “Series lecture notes in: Ensembles of learning ma-chines,” 2002.

[99] F. Alimoglu and E. Alpaydin, “Combining multiple representations and classifiers for pen-based handwritten digit recognition,” in Proceedings of the International Conference on Document Analysis and Recognition, 1997.

[100] C. Kaynak and E. Alpaydin, “Multistage cascading of multiple classifiers: One man’s noise is another man’s data,” in Proceedings of the International Conference on Document Machine Learning, 2000.

[101] R. Jacobs, “Bias/variance analysis for mixtures-of-experts architectures,” Neural Computation, vol. 9, pp. 369–383, 1997.

[102] D. Wolpert, “Stacked generalization,” Neural Networks, vol. 5, pp. 241–259, 1992.

[103] A. Sankar, “Bayesian model combination (BAYCOM) for improved recognition,”

in Proceedings of the International Conference on Acoustics Speech and Signal Processing, 2005.

[104] L. Breiman, “Bagging predictors,” Machine Learning, vol. 24, no. 2, pp. 123–140, 1996.

[105] B. Efron and R. Tibshirani, An Introduction to the Boostrap. Chapman &

Hall/CRC, 1993.

[106] L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification and Regression Trees. CRC Press, 1998.

[107] C. Bishop, Neural Networks for Pattern Recognition. Oxford: Oxford University Press, 1995.

[108] R. E. Schapire, “The strength of weak learnability,” Machine Learning, vol. 5, no. 2, pp. 197–227, 1990.

[109] Y. Freund and R. E. Schapire, “Experiments with a new boosting algorithm,” in Proceedings of the International Conference on Uncertainty in Machine Learning, 1996.

[110] R. E. Schapire, “The boosting approach to machine learning: An overview,”

in Proceedings of the Mathematical Sciences Research Institute Workshop on Nonlinear Estimation and Classification, 2002.

[111] E. Bauer and R. Kohavi, “An empirical comparison of voting classification algo-rithms: Bagging, boosting, and variants,” Machine Learning, vol. 36, no. 1, pp.

105–139, 1999.

[112] R. E. Schapire, Y. Freund, P. Bartlett, and W. S. Lee, “Boosting the margin: A new explanation of the effectiveness of voting methods,” The Annals of Statistics, vol. 26, no. 5, pp. 1651–1686, 1998.

[113] J. Carletta, Announcing the AMI meeting corpus. The ELRA Newsletter, 2006.

[114] D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hanne-mann, P. Motlıcek, Y. Qian, P. Schwarz, J. Silovsky, G. Stemmer, and K. Vesely,

“The kaldi speech recognition toolkit,” in Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2011.

[115] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–

2830, 2011.

[116] J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. W. Farley, and Y. Bengio, “Theano: A CPU and GPU math expres-sion compiler,” in Proceedings of the Python for Scientific Computing Conference, 2010.

[117] F. Chollet, “Keras,” https://github.com/fchollet/keras, 2015.

[118] R. A. Gopinath, “Maximum likelihood modeling with Gaussian distributions,”

in Proceedings of the International Conference on Acoustics Speech and Signal Processing, 1998.

[119] L. Deng, J. Li, J. T. Huang, K. Yao, D. Yu, F. Seide, M. Seltzer, G. Zweig, X. He, J. Williams, Y. Gong, A. Acero, and M. Seltzer, “Recent advances in deep learning for speech research at Microsoft,” in Proceedings of the International Conference on Acoustics Speech and Signal Processing, 2013.

在文檔中改善類神經網路聲學模型經由結合多任務學習與整體學習於會議語音辨識之研究 (頁 97-111)