• 沒有找到結果。

第六章 結論

6.3 結論

依據大腦在行動控制的模型(FARS 與 MOSAIC)為啟發,了解人類對 訊息的輸入與行為的輸出模式,衍生出具有自組織行為的增強式學習理論模 型,可以結合 SOM 與 Reinforcement Learning 理論。本研究與其它眾多先前 的研究也都可以說明 SOM 與 Reinforcement Learning 確實能夠結合在一起,

提供連續空間即時動態的表示與歸納。Reinforcement Learning 理論做為模型 的核心,負責代理人與環境回饋的運作,能有效地學習並挑選最佳行動,SOM 提供的拓樸特性也顯示在具有相似行動的任務上,藉由使用 neighborhood Q-learning 方法能加速學習速率。模型主要的缺點是 Q-table 的收斂性有很大 的部份取決於 SOM 輸入層是否已經完成收斂而決定,如果 SOM 輸入層能儘 早地完成狀態空間的歸納,Q-learning 才能更有效地完成 Q-table 的建置,批 次學習能改善這個問題,但同時也會失去模型應用上的即時性,不過即使不 採用批次學習,模型仍然能夠完成任務。本研究提出的模型未來仍然有許多 可改進的空間,但可以確定的是,該模型的確解決了 Reinforcement Learning 在連續狀態空間上的一個難題。

72

參考文獻

[1] J. A. Smith, “Applications of the self-organizing map to reinforcement learning,” Neural Networks, vol. 15, no. 1, pp. 8-9, Oct. 2002.

[2] K. S. Hwang, S. W. Tan, and M. C. Tsai, “Reinforcement learning to adaptive control of nonlinear systems,” IEEE Transactions on Systems, vol. 33, no. 3, pp. 514–521, Jun. 2003.

[3] C. J. Watkins and P. Dayan, “Technical note: Q-Learning,” Machine Learning, vol. 8, no.3, pp. 279-292, Mar. 1992.

[4] L. Y. Chen, “A SOM-based Fuzzy Systems Q-learning in Continuous State and Action Space,” M.S. Thesis, Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan, 2006.

[5] H. Y. Lin, “Hardware Implementation of a FAST-based Q-learning Algorithm,” M.S. Thesis, Computer Science Department, National Chung Cheng University, Tainan, Taiwan, 2006.

[6] L. E. Soto, “The Hierarchical Map Forming Model,” M.S. Thesis, Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan, 2006.

[7] A. H. Fagg, “A computational model of the cortical mechanisms involved in primate grasping,” Ph.D. Dissertation, Computer Science Department, University of Southern California, Los Angeles, California, U.S., 1996.

[8] A. Fagg and M. Arbib, “Modeling parietal–premotor interactions in primate control of grasping,” Neural Networks, vol. 11, no. 7, pp. 1277-1303, Oct.

1998.

73

[9] A. Murata, L. Fadiga, L. Fogassi, V. Gallese, V. Raos, and G. Rizzolatti,

“Object representation in the ventral premotor cortex (area F5) of the monkey,” Journal of Neurophysiology, vol. 78, pp. 2226-2230, 1997.

[10] G. Rizzolatti, R. Camarda, L. Fogassi, M. Gentilucci, G. Luppino, and M.

Matelli, “Functional organization of inferior area 6 in the macaque monkey:

Area F5 and the control of distal movements,“ Experimental Brain Research, vol. 71, no. 3, pp. 491-507, 1988.

[11] I. R. Johnston, “The role of optical expansion-pattern in aerial location,”

American Journal of Psychology, vol. 86, no. 2, pp. 311-324, Jun. 1973.

[12] C. B. Holroyd and M. G. Coles, “The Neural Basis of Human Error Processing: Reinforcement Learning, Dopamine, and the Error-Related Negativity,” Psychological Review, vol. 109, no. 4, pp. 679-709, Oct. 2002.

[13] E. Kohler, C. Keysers, M. A. Umilta, L. Fogassi, V. Gallese, and G.

Rizzolattii, “Hearing Sounds, Understanding Actions: Action Representation in Mirror Neurons,” Science, vol. 297. no. 5582, pp. 846-848, Aug. 2002.

[14] G. Rizzolatti and G. Luppino, “The cortical motor system,” Neuron, vol. 31, no. 6, pp. 889-901, Sep. 2001.

[15] K. F. Muakkassa and P. L. Strick, “Frontal Lobe Inputs to Primate Motor Cortex. Evidence for Four Somatotopically Organized ‘Premotor’ Areas,”

Brain Research, vol. 177, pp. 176-182, 1979.

[16] A. Murata, V. Gallese, M. Kaseda, S. Kunimoto, and H. Sakata, “Selectivity for the Shape, Size, and Orientation of Objects for Grasping in Neurons of Monkey Parietal Area AIP,” Neurophysiology, vol. 83, no. 5, pp. 2580-2601, May 2000.

74

[17] M. Kawato, K. Furukawa, and R. Suzuki, “A hierarchical neural network model for control and learning of voluntary movement,” Biological Cybernetics, vol. 57, pp. 169-185, 1987.

[18] R. A. Bianchi, C. H. Ribeiro, and A. H. Costa, “Heuristic selection of actions in multiagent reinforcement learning,” in Proceedings of the 20th International Joint Conference on Artifical Intelligence, Hyderabad, India, Jan. 2007, pp. 690-696.

[19] R. S. Sutton and A. G. Barto, Reinforcement Learning. Cambridge, Massachusetts, U.S.: MIT Press, 1998.

[20] S. Russell and P. Norvig, Artificial Intelligence: A Modern Approach. Upper Saddle River, New Jersey, U.S.: Prentice Hall, 2003.

[21] R. Sutton, “Learning to predict by the method of temporal difference,”

Machine Learning, vol. 3, no. 1, pp. 9-44, 1998.

[22] C. J. Watkins and P. Dayan, “Technical Note: Q-Learning,” Machine Learning, vol. 8, no. 3-4, pp. 279-292, 1992.

[23] D. H. Ackley and M. L. Littman, “Generalisation and scaling in reinforcement learning,” in Advances in neural information processing systems, D. Touretzky, Ed., San Mateo, California, U.S.: Morgan Kaufmann, 1990, pp. 550–557.

[24] S. Sehad and C. Touzet, “Self-organising map for reinforcement learning:

Obstacle avoidance with khepera,” in Proceedings of from Perception to Action, Lausanne, Switzerland, Sep. 1994, pp. 420-423.

[25] J. Peng and R. J. Williams, “Incremental multi-step q-learning,” Machine Learning, vol. 22, pp. 283–290, 1996.

75

[26] V. Gullapalli, “A stochastic reinforcement learning algorithm for learning real-valued functions,” Neural Networks, vol. 3, pp. 671–692, 1990.

[27] T. Kohonen, Self-Organizing Maps. Heidelberg, Germany: Springer Verlag, 2001.

[28] G. V. Bekesy, Sensory Inhibition. Princeton, New Jersey, U.S.: Princeton University Press, 1967.

[29] F. Mulier and V. Cherkassky, “Learning rate schedules for selforganizing maps,” in Proceedings of the 12th International Conferences on Pattern Recognition, Jerusalem, Israel, Oct. 1994, pp. 224-228.

[30] J. S. Andrew, “Dynamic Generalisation of Continuous Action Spaces in Reinforcement Learning: A Neurally Inspired Approach,” Ph.D. Dissertation, Institute for Adaptive and Neural Computation Division of Informatics, University of Edinburgh, Midlothian, U.K., 2001.

[31] M. J. Palakal, S. K. Murthy, Chittajallu, and D. Wong, “Tonotopic Representation of Auditory Responses Using Self-Organizing Maps,”

Mathematical Computing Modeling, vol. 22, no. 2, pp. 7-21, 1995.

[32] J. Wedel and D. Polani, “Critic-based learning of actions with self-organising feature maps,” Institute for Informatics, Johannes-Gutenberg University, Mainz, Rhineland Palatinate, Germany, Technical Report 5/96, 1996.

[33] W. K. Qian and Z. D. Fang, “Design method for double inverted pendulum control system based on MATLAB,” Journal of University of Shanghai For Science and Technology, vol. 26, no. 6, pp. 590-592, Jun. 2004.

[34] C. C. Tsai, “Application of Model Predictive Control to Parallel-Type Double Inverted Pendulum Driven by a Linear Motor,” M.S. Thesis,

76

Department of Mechanical Engineering, National Cheng Kung University, Tainan, Taiwan, 2006.

[35] B. Karasozen, P. Pentrop, and Y. Wagner, “Inverted n-Bar Model in Descriptor and in State Space Form,” Mathematical Modelling of Systems, vol. 4, no. 4, pp. 272-285, 1995.

[36] Y. X. Lin, “Double Link Inverted Pendulum System Swing Up and Balance Control,” M.S. Thesis, Department of Mechanical Engineering, National Cheng Kung University, Tainan, Taiwan, 2003.

[37] S. X. Lin, “Design and Implementation of Fuzzy Controller by Using DSP Chips and Its Application to Inverted Pendulum Car,” M.S. Thesis, Department of Mechanical Engineering, National Cheng Kung University, Tainan, Taiwan, 2002.

[38] J. Xiao, “An Adaptive Fuzzy-neural Controller for Multivariable System,” in International Conference on Advanced Intelligent Mechatronics, California, U.S., Jul. 2005, pp. 24-28.

[39] H. Unbehauen, “Discrete Computer Control Of A Triple-Inverted Pendulum,”

Optimal control applications & methods, vol. 11, no. 2, pp. 157-171, Apr.

1990.

[40] N. Kobori, K. Suzuki, P. Hartono, and S. Hashimoto, “Learning to control a joint driven double inverted pendulum using nested actor/critic algorithm,”

in Proceedings of the 9th International Conference, Tokyo, Japan, Nov. 11 2002, pp. 2610-2614.

[41] C. J. Ding, P. Duan, and M. L. Zhang, “Double Inverted Pendulum System Control Strategy Based on Fuzzy Genetic Algorithm,” in Proceedings of the