博碩士論文 107523042 詳細資訊




以作者查詢圖書館館藏 以作者查詢臺灣博碩士 以作者查詢全國書目 勘誤回報 、線上人數:44 、訪客IP:3.144.48.135
姓名 林峻永(Jun-Yong Lin)  查詢紙本館藏   畢業系所 通訊工程學系
論文名稱
(Spectrum Management for V2X Communication with Multi-Agent Partial Information Sharing)
相關論文
★ 基於馬賽克特性之低失真實體電路佈局保密技術★ 多路徑傳輸控制協定下從無線區域網路到行動網路之無縫換手
★ 感知網路下具預算限制之異質性子頻段分配★ 下行服務品質排程在多天線傳輸環境下的效能評估
★ 多路徑傳輸控制協定下之整合型壅塞及路徑控制★ Opportunistic Scheduling for Multicast over Wireless Networks
★ 適用多用戶多輸出輸入系統之低複雜度比例公平性排程設計★ 利用混合式天線分配之 LTE 異質網路 UE 與 MIMO 模式選擇
★ 基於有限預算標價式拍賣之異質性頻譜分配方法★ 適用於 MTC 裝置 ID 共享情境之排程式分群方法
★ Efficient Two-Way Vertical Handover with Multipath TCP★ 多路徑傳輸控制協定下可亂序傳輸之壅塞及排程控制
★ 移動網路下適用於閘道重置之群體換手機制★ 使用率能小型基地台之拍賣是行動數據分流方法
★ 高速鐵路環境下之通道預測暨比例公平性排程設計★ 用於行動網路效能評估之混合式物聯網流量產生器
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [相關文章]   [文章引用]   [完整記錄]   [館藏目錄]   [檢視]  [下載]
  1. 本電子論文使用權限為同意立即開放。
  2. 已達開放權限電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。
  3. 請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。

摘要(中) 隨著車聯網 (vehicle-to-everything, V2X) 的技術發展,新一代 V2X 的系統架構整合了車對車 (vehicle-to-vehicle, V2V)、車對基礎設施 (vehicle-to-infrastructure, V2I)、 與車對行人 (vehicle-to-pedestrian, V2P),如何在路側設施 (roadside Unit)、自駕車端 (on-board unit) 以及後端服務器 (backend server) 間提供低功耗、低延遲、高可靠與安全的資料交換,是非常大的挑戰。在近期的研究中,增強式學習 (reinforcement learning, RL) 在車聯網的應用中取得卓越的進展,許多研究者也開始透過RL的方式解決資源分配的問題,而多代理人增強式學習 (MARL) 近期受到更多的關注,因為MARL的架構更能貼近我們使用者的環境,因此,在這篇論文中,我們也希望透過 MARL 的架構,探討如何在 V2X 中進行有效的資源分配,以最大化系統的吞吐量與頻寬效益,我們比較了多種知名的 RL 和 MARL 演算法,並使用真實模擬的道路資料來進行環境建置,除此之外,由於傳統的 MARL 專注在分散式的架構,因此,代理人只能透過自身的資訊來優化合作任務的策略,然而這種方式也常常掉入次佳解的情況,因此,我們提出一個嶄新的在車聯網中基於有限資訊分享下之多代理人頻寬分配方,使得整體系統的吞吐量能更進一步的提升。
摘要(英) Multi-agent reinforcement learning (MARL) in vehicular communication is a promising topic and attracted many researchers due to their ability to solve highly complex optimization problems. In this paper, to enhance the system throughput and spectrum efficiency, the vehicular agents can select different transmission modes, power, and sub-channels to maximize the overall system throughput in clusters. Since the agent takes action given its partial observation of the global state in conventional MARL structures, the efficiency of cooperative actions is thus degraded. In this work, we propose a novel MARL resource allocation algorithm for vehicular networks with information sharing. We extended the advantage actor-critic (A2C) to multi-agent A2C and using long short-term memory (LSTM) to estimate the global state given partial information. Moreover, a comprehensive comparison of landmark schemes is conducted on the realistic setup generated by Simulation of Urban MObility (SUMO). The result shows that the agent achieves favorable performance with the proposed scheme without full observability to the environment.
關鍵字(中) ★ 資源管理
★ 車聯網
★ 機器學習
關鍵字(英)
論文目次 1 Introduction.........................................1
1.1Background .........................................1
1.2Motivation..........................................1
1.3Contribution .......................................2
1.4Framework ..........................................3

2 Background and Related Works ........................4
2.1Resources management ...............................4
2.2Reinforcement learning model .......................5

2.3POMDP ..............................................7
3 System Model and Problem Formulation.................8
3.1System Model .......................................8
3.2Communication Model.................................9

4 Multi-agent Reinforcement learning with POMDP for V2X communicationnetwork ............................12
4.1The POMDP Model and State-Action-Reward setting....12
4.2Extend A2C to MAA2C................................14
4.3Proposed method and network structure..............14
4.4Training Algorithm ................................17

5 Numerical Result....................................19
5.1Simulation Setup...................................19
5.2Performance Evaluation in Uniform Traffic Type.....21
5.3Performance Evaluation in Multiple Traffic Types...24

6 Conclusion and Future Work..........................28
Bibliography..........................................29
參考文獻 [1] Hanbyul Seo, Ki-Dong Lee, Shinpei Yasukawa, Ying Peng, and Philippe Sartori.Lte evolution for vehicle-to-everything services.IEEE communications magazine,54(6):22–28, 2016.
[2] Zhipeng Liu, Yinhui Han, Jianwei Fan, Lin Zhang, and Yunzhi Lin. Joint optimizationof spectrum and energy efficiency considering the c-v2x security: A deep reinforce-ment learning approach.arXiv preprint arXiv:2003.10620, 2020.
[3] Le Liang, Shijie Xie, Geoffrey Ye Li, Zhi Ding, and Xingxing Yu. Graph-basedresource sharing in vehicular communication.IEEE Transactions on Wireless Com-munications, 17(7):4579–4592, 2018.
[4] Le Liang, Joonbeom Kim, Satish C Jha, Kathiravetpillai Sivanesan, and Geoffrey YeLi. Spectrum and power allocation for vehicular communications with delayed csifeedback.IEEE Wireless Communications Letters, 6(4):458–461, 2017.
[5] Muhammad Ikram Ashraf, Mehdi Bennis, Cristina Perfecto, and Walid Saad. Dy-namic proximity-aware resource allocation in vehicle-to-vehicle (v2v) communica-tions. In2016 IEEE Globecom Workshops (GC Wkshps), pages 1–6. IEEE, 2016.
[6] Bo Bai, Wei Chen, Khaled Ben Letaief, and Zhigang Cao. Low complexity outageoptimal distributed channel allocation for vehicle-to-vehicle communications.IEEEJournal on Selected Areas in Communications, 29(1):161–172, 2010.
[7] Hao Ye and Geoffrey Ye Li. Deep reinforcement learning based distributed resourceallocation for v2v broadcasting. In2018 14th International Wireless Communications& Mobile Computing Conference (IWCMC), pages 440–445. IEEE, 2018.
[8] Le Liang, Hao Ye, and Geoffrey Ye Li. Toward intelligent vehicular networks: Amachine learning framework.IEEE Internet of Things Journal, 6(1):124–135, 2018.
[9] Hao Ye, Geoffrey Ye Li, and Biing-Hwang Fred Juang. Deep reinforcement learningbased resource allocation for v2v communications.IEEE Transactions on VehicularTechnology, 68(4):3163–3173, 2019.
[10] Liang Wang, Hao Ye, Le Liang, and Geoffrey Ye Li. Learn to compress csi andallocate resources in vehicular networks.IEEE Transactions on Communications,2020.
[11] Helin Yang, Xianzhong Xie, and Michel Kadoch. Intelligent resource managementbased on reinforcement learning for ultra-reliable and low-latency iov communicationnetworks.IEEE Transactions on Vehicular Technology, 68(5):4157–4169, 2019.
[12] Min Zhao, Yifei Wei, Mei Song, and Guo Da. Power control for d2d communicationusing multi-agent reinforcement learning. In2018 IEEE/CIC International Confer-ence on Communications in China (ICCC), pages 563–567. IEEE, 2018.
[13] Zheng Li, Caili Guo, and Yidi Xuan. A multi-agent deep reinforcement learningbased spectrum allocation framework for d2d communications. In2019 IEEE GlobalCommunications Conference (GLOBECOM), pages 1–6. IEEE, 2019.
[14] Le Liang, Hao Ye, and Geoffrey Ye Li. Spectrum sharing in vehicular networksbased on multi-agent reinforcement learning.IEEE Journal on Selected Areas inCommunications, 37(10):2282–2292, 2019.[15] Dohyun Kwon and Joongheon Kim. Multi-agent deep reinforcement learning forcooperative connected vehicles. In2019 IEEE Global Communications Conference(GLOBECOM), pages 1–6. IEEE, 2019.
[16] Ranjit Nair, Milind Tambe, Maayan Roth, and Makoto Yokoo. Communications forimproving policy computation in distributed pomdps. InProceedings of the ThirdInternational Joint Conference on Autonomous Agents and Multiagent Systems, 2004.AAMAS 2004., pages 1098–1105. IEEE, 2004.
[17] Rose E Wang, Michael Everett, and Jonathan P How. R-maddpg for partially observ-able environments and limited communication.arXiv preprint arXiv:2002.06684,2020.
[18] Shayegan Omidshafiei, Jason Pazis, Christopher Amato, Jonathan P How, and JohnVian. Deep decentralized multi-task multi-agent reinforcement learning under partialobservability.arXiv preprint arXiv:1703.06182, 2017.
[19] Jakob N Foerster, Yannis M Assael, Nando de Freitas, and Shimon Whiteson. Learn-ing to communicate to solve riddles with deep distributed recurrent q-networks.arXivpreprint arXiv:1602.02672, 2016.
[20] WANG amd iang. Learn to allocate resources in vehicular network.arXiv preprintarXiv:1908.03447, 2019.
[21] Ibrahim Althamary, Chih-Wei Huang, and Phone Lin. A survey on multi-agent rein-forcement learning methods for vehicular networks. In2019 15th International Wire-less Communications & Mobile Computing Conference (IWCMC), pages 1154–1159.IEEE, 2019.
[22] Kai Arulkumaran, Marc Peter Deisenroth, Miles Brundage, and Anil AnthonyBharath. A brief survey of deep reinforcement learning.IEEE Signal ProcessingMagazine, 34(6):26–38, 2017.
[23] Mehdi Mohammadi, Ala Al-Fuqaha, Sameh Sorour, and Mohsen Guizani. Deeplearning for iot big data and streaming analytics: A survey.IEEE CommunicationsSurveys & Tutorials, 20(4):2923–2960, 2018.
[24] Tianshu Chu, Jie Wang, Lara Codec`a, and Zhaojian Li. Multi-agent deep reinforce-ment learning for large-scale traffic signal control.IEEE Transactions on IntelligentTransportation Systems, 2019.
[25] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness,Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, GeorgOstrovski, et al. Human-level control through deep reinforcement learning.Nature,518(7540):529, 2015.
[26] Vijay R Konda and John N Tsitsiklis. Actor-critic algorithms. InAdvances in neuralinformation processing systems, pages 1008–1014, 2000.
[27] Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez,Yuval Tassa, David Silver, and Daan Wierstra. Continuous control with deep rein-forcement learning.arXiv preprint arXiv:1509.02971, 2015.
[28] Ryan Lowe, Yi I Wu, Aviv Tamar, Jean Harb, OpenAI Pieter Abbeel, and Igor Mor-datch. Multi-agent actor-critic for mixed cooperative-competitive environments. InAdvances in neural information processing systems, pages 6379–6390, 2017.
[29] Frans A Oliehoek. Decentralized pomdps. InReinforcement Learning, pages 471–503. Springer, 2012.
[30] Matthew Hausknecht and Peter Stone. Deep recurrent q-learning for partially observ-able mdps. In2015 AAAI Fall Symposium Series, 2015.
[31] Sepp Hochreiter and J ̈urgen Schmidhuber. Long short-term memory.Neural compu-tation, 9(8):1735–1780, 1997.
[32] Simulation of urban mobility.http://sumo.sourceforge.net/. Accessed: 2019-06-30.
[33] Lixia Xue, Yuchen Yang, and Decun Dong. Roadside infrastructure planning schemefor the urban vehicular networks.Transportation Research Procedia, 25:1380–1396,2017.
[34] Prithviraj Patil and Aniruddha Gokhale. Improving the reliability and availability ofvehicular communications using voronoi diagram-based placement of road side units.In2012 IEEE 31st Symposium on Reliable Distributed Systems, pages 400–401. IEEE,2012.
[35] Yi-Han Xu, Cheng-Cheng Yang, Min Hua, and Wen Zhou. Deep deterministic policygradient (ddpg)-based resource allocation scheme for noma vehicular communica-tions.IEEE Access, 8:18797–18807, 2020.
[36] Jakob Foerster, Nantas Nardelli, Gregory Farquhar, Triantafyllos Afouras, Philip HSTorr, Pushmeet Kohli, and Shimon Whiteson. Stabilising experience replay for deepmulti-agent reinforcement learning. InProceedings of the 34th International Confer-ence on Machine Learning-Volume 70, pages 1146–1155. JMLR. org, 2017.
[37] Thanh Thi Nguyen, Ngoc Duy Nguyen, and Saeid Nahavandi. Deep reinforcementlearning for multiagent systems: A review of challenges, solutions, and applications.IEEE Transactions on Cybernetics, 2020.
[38] Qian Long, Zihan Zhou, Abhibav Gupta, Fei Fang, Yi Wu, and Xiaolong Wang. Evo-lutionary population curriculum for scaling multi-agent reinforcement learning.arXivpreprint arXiv:2003.10423, 2020.
指導教授 黃志煒 審核日期 2020-8-20
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   
網路書籤 Google bookmarks   del.icio.us   hemidemi   myshare   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明