第四章 實驗分析
第三節 不同情境下的強化學習路由效能評估
4.3.5 地圖 (A) 與地圖 (B) 的路由成功率比較
圖 30. 不同通訊距離的平均所需路由節點數
圖 30 的結果顯示,在干擾程度較低時,由於封包可以傳送的較遠,普遍都比高度干擾 的環境需要更少的封包傳遞次數,三者其中又以透過 Q-Learning 的路由策略為最低。
對照圖 29,可以發現透過 Q-Learning 所學習出的路由策略在干擾程度較低的環境,相 對於使用SARSA 路由策略,有著更相似於 GPSR 的貪婪結果。然而在高度干擾的環境 中,Q-Learning 則使用較高的額外成本、較多的轉送次數,卻達到更低的端點對端點延 遲。充分顯現了vDRL 基於強化學習而產生的彈性與延展性,我們很難將 vDRL 使用的 策略完整的描述,其中更涉及類神經網路的推導過程而造成解釋的困難,但是從實驗結 論的觀點,在有效通訊距離較小的情況之下,較多的探索、嘗試轉送不同的周遭節點可 能對於路由策略比起貪婪更來得有效。
4.3.5 地圖
(A)
與地圖(B)
的路由成功率比較這個實驗主要探討當地圖擴大時,vDRL 所訓練出的路由策略與 GPSR 平均路由成功率 的比較。地圖 (B) 的參數設定相較於地圖 (A) : 1) 區域為原來的二十五倍之大,間
接造成車輛密度更加稀疏; 2) 直線道路的長度更長,大約為原來的三倍,使得行駛方 向的改變機會更低; 3) 有效傳送距離更大,約為原來的三倍。
在地圖 (B) 的參數設定之下,vDRL 的訓練難度更高,因為它需要基於隨機的路由 策略開始,發展出可行的路由策略,而其搜索空間也與地圖大小、車輛密度等等具有高 度相關。這個實驗也與4.3.1~4.3.4 的實驗設定不同,vDRL 分別在兩個不同的地圖參數 下被重新訓練。圖 31為三種路由策略在不同地圖參數下的路由成功率:
圖 31. 不同地圖參數下的平均封包到達成功率
圖 31 呈現出 vDRL 在地圖規模擴大的情況之下重新訓練之後,也能夠具有不錯的路由 成功率。vDRL 不管透過 SARSA 或是 Q-Learning 所找到的路由策略仍較 GPSR 有著更 高的路由成功機率,再次顯示此研究所提出基於強化學習的vDRL 路由協定能夠在不同 的地圖參數之下訓練,達到一定水準的路由表現。
然而我們也同時在地圖 (B) 的設定之下,發現 vDRL 的強化學習訓練機制可能存 有收斂上的問題,尤其對於 Q-Learning 而言,其決策為最大化下一時刻的獎勵報酬,
若在下一時刻有任一節點從未有類似的狀態被評估過,可能會使得行動價值函數由於與
尚未充分探索的選擇一起評分排序,造成決策嚴重失準。比起相較於直接從過去策略學 習的SARSA 來說尤為明顯。也就是說,Q-Learning 可能需要更多的訓練來使得路由策 略在更大的地圖上收斂穩定。圖 31 的結果也隱含著可能未完全收斂的訊息,我們能夠 發現在地圖增大許多之後,訓練之後所提供的效能提升,相對於地圖小的時候更為有 限。
第五章 結論與未來展望
此研究提出以深度強化學習訓練適用於車輛隨意網路路由問題的路由協定vDRL。透過 受控的模擬環境,有效地使用SARSA 以及 Q-Learning 找出在通用環境之中可行的路由 策略,達到高路由成功率、低端點對端點延遲,以及低所需轉送次數。在多個實驗之中,
vDRL 顯現出優於傳統 GPSR 的路由表現:路由成功率的差異最大可達 1.5 倍、端點延 遲可最高降低為 GPSR 的二分之一,並且透過 SARSA 所訓練出來的路由策略總是保持 較低的路由所需轉送節點數;vDRL 路由協定在不同於訓練環境參數的多組實驗中,也 可以取得一致性的優異表現,展現出高度的泛化能力;我們也在 4.3.5 的實驗裡,重新 訓練vDRL 在更大的地圖當中,並取得優於 GPSR 的路由成功率。
然而,此研究所提出的路由協定仍有許多可以改善的地方,例如支援真實的街道地 圖、導入實際的車流量資訊模擬等等,我們也將在未來使用更為實際的車輛機動行為模 型。在實驗的最後我們發現當地圖規模擴大時,vDRL 的訓練上有了收斂上的問題而需 要更長的訓練圈數以及計算資源,如何在巨大的地圖上更有效率的訓練也是需要持續改 進的目標。除此之外,也可嘗試將vDRL 應用至其他種類的載具,如船舶、四軸飛行器 等等,前者的移動方式較為隨機、地圖廣大且載具密度更低;後者則是需要在三度空間 中進行路由。將 vDRL 從單點路由延伸至多路徑路由是更進一步可以提升路由成功率 的方法之一,如何將強化學習應用至多路徑路由涉及多代理人 (Multi-agent) 的合作與 協調,也是近年來強化學習的熱門研究方向。
參考文獻
[1] Benamar, Maria, et al. "Recent study of routing protocols in VANET: survey and taxonomy." WVNT 1st International Workshop on Vehicular Networks and Telematics.
2013.
[2] Martinez, Francisco J., et al. "A survey and comparative study of simulators for vehicular ad hoc networks (VANETs)." Wireless Communications and Mobile
Computing 11.7 (2011): 813-828.
[3] Hartenstein, Hannes, and L. P. Laberteaux. "A tutorial survey on vehicular ad hoc networks." IEEE Communications magazine 46.6 (2008).
[4] Yousefi, Saleh, Mahmoud Siadat Mousavi, and Mahmood Fathy. "Vehicular ad hoc networks (VANETs): challenges and perspectives." ITS Telecommunications
Proceedings, 2006 6th International Conference on. IEEE, 2006.
[5] Singh, Surmukh, and Sunil Agrawal. "VANET routing protocols: Issues and challenges." Engineering and Computational Sciences (RAECS), 2014 Recent
Advances in. IEEE, 2014.
[6] Liu, Jianqi, et al. "A survey on position-based routing for vehicular ad hoc networks."
Telecommunication Systems 62.1 (2016): 15-30.
[7] Sharef, Baraa T., Raed A. Alsaqour, and Mahamod Ismail. "Vehicular communication ad hoc routing protocols: A survey." Journal of network and computer applications 40 (2014): 363-396.
[8] Sutton, Richard S., and Andrew G. Barto. Reinforcement learning: An introduction.
Vol. 1. No. 1. Cambridge: MIT press, 1998.
[9] Kaelbling, Leslie Pack, Michael L. Littman, and Andrew W. Moore. "Reinforcement learning: A survey." Journal of artificial intelligence research 4 (1996): 237-285.
[10] Sutton, Richard S., Doina Precup, and Satinder Singh. "Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning."
Artificial intelligence 112.1-2 (1999): 181-211.
[11] Spaan, Matthijs TJ. "Partially observable Markov decision processes." Reinforcement
Learning. Springer, Berlin, Heidelberg, 2012. 387-414.
[12] Jaakkola, Tommi, Satinder P. Singh, and Michael I. Jordan. "Reinforcement learning algorithm for partially observable Markov decision problems." Advances in neural
information processing systems. 1995.
[13] Tesauro, Gerald. "Temporal difference learning and TD-Gammon." Communications
of the ACM 38.3 (1995): 58-68.
[14] Nareyek, Alexander. "Choosing search heuristics by non-stationary reinforcement learning." Metaheuristics: Computer decision-making. Springer, Boston, MA, 2003.
523-544.
[15] Rummery, Gavin A., and Mahesan Niranjan. On-line Q-learning using connectionist
systems. Vol. 37. University of Cambridge, Department of Engineering, 1994.
[16] Watkins, Christopher JCH, and Peter Dayan. "Q-learning." Machine learning 8.3-4 (1992): 279-292.
[17] Ranjan, Prabhakar, and Kamal Kant Ahirwar. "Comparative study of vanet and manet routing protocols." Proc. of the International Conference on Advanced Computing
and Communication Technologies (ACCT 2011). 2011.
[18] Li, Fan, and Yu Wang. "Routing in vehicular ad hoc networks: A survey." IEEE
Vehicular technology magazine 2.2 (2007).
[19] Perkins, Charles, Elizabeth Belding-Royer, and Samir Das. Ad hoc on-demand
distance vector (AODV) routing. No. RFC 3561. 2003.
[20] Johnson, David B., and David A. Maltz. "Dynamic source routing in ad hoc wireless networks." Mobile computing. Springer, Boston, MA, 1996. 153-181.
[21] Karp, Brad, and Hsiang-Tsung Kung. "GPSR: Greedy perimeter stateless routing for wireless networks." Proceedings of the 6th annual international conference on Mobile
computing and networking. ACM, 2000.
[22] Blum, Jeremy, Azim Eskandarian, and Lance Hoffman. "Mobility management in IVC networks." Intelligent Vehicles Symposium, 2003. Proceedings. IEEE. IEEE, 2003.
[23] Maihofer, Christian. "A survey of geocast routing protocols." IEEE Communications
Surveys & Tutorials 6.2 (2004).
[24] Sutton, Richard S., et al. "Policy gradient methods for reinforcement learning with function approximation." Advances in neural information processing systems. 2000.
[25] Boyan, Justin A., and Andrew W. Moore. "Generalization in reinforcement learning:
Safely approximating the value function." Advances in neural information processing
systems. 1995.
[26] Parr, Ronald, et al. "An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning." Proceedings of the 25th
international conference on Machine learning. ACM, 2008.
[27] Mnih, Volodymyr, et al. "Human-level control through deep reinforcement learning."
Nature 518.7540 (2015): 529.
[28] Macker, Joseph. "Mobile ad hoc networking (MANET): Routing protocol performance issues and evaluation considerations." (1999).
[29] Ranjan, Prabhakar, and Kamal Kant Ahirwar. "Comparative study of vanet and manet routing protocols." Proc. of the International Conference on Advanced Computing
and Communication Technologies (ACCT 2011). 2011.
[30] Chitkara, Mahima, and Mohd Waseem Ahmad. "Review on manet: characteristics, challenges, imperatives and routing protocols." International Journal of Computer
Science and Mobile Computing 3.2 (2014): 432-437.
[31] Menouar, Hamid, Massimiliano Lenardi, and Fethi Filali. "Movement prediction-based routing (MOPR) concept for position-based routing in vehicular networks." Vehicular Technology Conference, 2007. VTC-2007 Fall. 2007 IEEE 66th.
IEEE, 2007.
[32] Souza, Evandro, Ioanis Nikolaidis, and Pawel Gburzynski. "A new aggregate local mobility (ALM) clustering algorithm for VANETs." Communications (ICC), 2010
IEEE International Conference on. IEEE, 2010.
[33] Rahbar, Hamidreza, Kshirasagar Naik, and Amiya Nayak. "DTSG: Dynamic time-stable geocast routing in vehicular ad hoc networks." Ad Hoc Networking
Workshop (Med-Hoc-Net), 2010 The 9th IFIP Annual Mediterranean. IEEE, 2010.
[34] Young-Bae Ko and Nitin H. Vaidya, “Geocasting in mobile ad hoc networks:
Location-based multicast algorithms,” in Proceedings of the 2nd Workshop on Mobile
Computing Systems and Applica- tions (WMCSA 99), New Orleans, USA, Feb. 1999,
pp. 101–110.
[35] Young-Bae Ko and Nitin H. Vaidya, “Location-aided routing (LAR) in mobile ad hoc networks,” in Proceedings of the Fourth ACM/IEEE International Conference on
Mobile Computing and Networking (MobiCom’98), Dallas, USA, 1998.
[36] Boyan, Justin A., and Michael L. Littman. "Packet routing in dynamically changing networks: A reinforcement learning approach." Advances in neural information
processing systems. 1994.
[37] Peshkin, Leonid, and Virginia Savova. "Reinforcement learning for adaptive routing." Neural Networks, 2002. IJCNN'02. Proceedings of the 2002 International
Joint Conference on. Vol. 2. IEEE, 2002.
[38] Stampa, Giorgio, et al. "A Deep-Reinforcement Learning Approach for Software-Defined Networking Routing Optimization." arXiv
preprint
arXiv:1709.07080 (2017).
[39] Huang, Chung-Ming, Kun-chan Lan, and Chang-Zhou Tsai. "A survey of opportunistic networks." Advanced
Information Networking and Applications-Workshops, 2008. AINAW 2008. 22nd International Conference on.
IEEE, 2008.
[40] Fall, Kevin. "A delay-tolerant network architecture for challenged internets." Proceedings of the 2003 conference on Applications, technologies,
architectures, and protocols for computer communications. ACM, 2003.
[41] Vahdat, Amin, and David Becker. "Epidemic routing for partially connected ad hoc networks." (2000).
[42] Lindgren, Anders, Avri Doria, and Olov Schelen. "Probabilistic routing in intermittently connected networks." Service assurance with partial and intermittent
resources. Springer, Berlin, Heidelberg, 2004. 239-254.
[43] Burgess, John, et al. "Maxprop: Routing for vehicle-based disruption-tolerant networks." INFOCOM 2006. 25th IEEE International Conference on Computer
Communications. Proceedings. IEEE, 2006.
[44] Leontiadis, Ilias, and Cecilia Mascolo. "GeOpps: Geographical opportunistic routing for vehicular networks." (2007): 1-6.