Scalable Radio Resource Management using DDPG Meta Reinforcement Learning

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：27

、訪客IP：3.141.244.153

姓名

陳彥辰(Yen-Chen Chen) 查詢紙本館藏

畢業系所

通訊工程學系

論文名稱

(Scalable Radio Resource Management using DDPG Meta Reinforcement Learning)

相關論文

★ 基於馬賽克特性之低失真實體電路佈局保密技術	★ 多路徑傳輸控制協定下從無線區域網路到行動網路之無縫換手
★ 感知網路下具預算限制之異質性子頻段分配	★ 下行服務品質排程在多天線傳輸環境下的效能評估
★ 多路徑傳輸控制協定下之整合型壅塞及路徑控制	★ Opportunistic Scheduling for Multicast over Wireless Networks
★ 適用多用戶多輸出輸入系統之低複雜度比例公平性排程設計	★ 利用混合式天線分配之 LTE 異質網路 UE 與 MIMO 模式選擇
★ 基於有限預算標價式拍賣之異質性頻譜分配方法	★ 適用於 MTC 裝置 ID 共享情境之排程式分群方法
★ Efficient Two-Way Vertical Handover with Multipath TCP	★ 多路徑傳輸控制協定下可亂序傳輸之壅塞及排程控制
★ 移動網路下適用於閘道重置之群體換手機制	★ 使用率能小型基地台之拍賣是行動數據分流方法
★ 高速鐵路環境下之通道預測暨比例公平性排程設計	★ 用於行動網路效能評估之混合式物聯網流量產生器

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

由於第五代通訊系統(5G)的發展，增強了網路系統的能力及靈活度，將允許更多極端且嚴峻的應用服務出現在第五代通訊系統上，例如大型多人的虛擬實境線上遊戲。邊緣雲端網路架構期待能有效地用來提升虛擬實境應用。然而，在多人虛擬實境的環境下，使用者的行為會受到其他使用者或是虛擬環境中的物件影響，也因此導致了資源管理的複雜度增加而變得比以往更加困難。在這篇研究中，我們採用了Deep Deterministic Policy Gradient (DDPG) 機器學習演算法來進行資源管理。我們整合了3D資源管理架構並針對機器學習提出了組件化的執行動作，並利用使用者的互動狀態進行分組。
由於現有的機器學習探索策略不適合用在長時間的資源管理上，我們提出了透過meta learning架構的探索策略來強化DDPG演算法。機器學習面臨的另一個挑戰是，當我們改變了輸入資料的維度會導致已經訓練好的模型會陷入無用武之地。我們提出「環境資訊對輸入」的翻譯機，在放入機器學習演算法之前，將環境資訊編碼成輸入，編碼後的輸入資料會擁有固定維度，就能放入已經訓練好的模型之中。
從實驗結果顯示，我們提出的meta DDPG演算法可以達到最高的滿足率，而我們提出的編碼架構雖然會讓表現稍微變差，不過當我們的模型遇到新的環境時，可以不用重新訓練新的模型，能夠直接使用，而這也會是比較有效率的學習方式。

摘要(英)

The development of the fifth-generation (5G) system on capability and flexibility enables emerging applications with stringent requirements. Mobile edge cloud (MEC) is expected to be an effective solution to serve virtual reality (VR) applications over wireless networks. In multi-user VR environments, highly dynamic interaction between users increases the difficulty and complexity of radio resource management (RRM). Furthermore, a trained management model is often obsolete when particular key environment parameters are changed. In this thesis, a scalable deep reinforcement learning-based approach is proposed specifically for resource scheduling in the edge network. We integrate a 3D radio resource structure with componentized Markov decision process (MDP) actions to work on user interactivity-based groups. A translator-inspired "information-to-state" encoder is applied to generate a scalable RRM model, which can be reused for environments with various numbers of base stations. Also, a meta-learning-based exploration strategy is introduced to improve the exploration in the deep deterministic policy gradient (DDPG) training process. The result shows that the modified meta exploration strategy improves DDPG significantly. The scalable learning structure with complete model reuse provides comparable performance to individually trained models.

關鍵字(中)

★ 資源管理
★ 虛擬實境

關鍵字(英)

論文目次

1 Introduction
1.1 Virtual Reality .................................1
1.2 Motivation ......................................1
1.3 Contribution ....................................2
1.4 Framework .......................................3

2 Background and Related Works
2.1 Reinforcement Learning ..........................4
2.2 Meta Learning ...................................5
2.3 Resource Management .............................5

3 User Interactive Radio Resource Management ........7
3.1 System Model ....................................7
3.2 Problem Formulation .............................8
3.3 User Interaction and Grouping Mechanism .........8

4 Scalable RRM via Meta Reinforcement learning ......11
4.1 MDP Model .......................................11
4.2 Componentized RRM with DDPG .....................12
4.3 Scalability Extension through Translation Model .14
4.4 Exploration via Meta-Learning ...................16

5 Performance Evaluation ...........................19
5.1 Simulation Setting ..............................19
5.2 Performance .....................................21

6 Conclusion and Future Work ........................26
6.1 Conclusion ......................................26
6.2 Future Work .....................................26

Bibliography ........................................27

參考文獻

[1] J. Park, P. Popovski, and O. Simeone, “Minimizing latency to support vr social in-teractions over wireless cellular systems via bandwidth allocation,”IEEE WirelessCommunications Letters, vol. 7, no. 5, pp. 776–779, Oct 2018.
[2] L. Wang, L. Jiao, T. He, J. Li, and M. M ̈uhlh ̈auser, “Service entity placement forsocial virtual reality applications in edge computing,”IEEE INFOCOM 2018 - IEEEConference on Computer Communications, pp. 468–476, 2018.
[3] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, andD. Wierstra, “Continuous control with deep reinforcement learning,” 2015.
[4] S.-C. Tseng, Z.-W. Liu, Y.-C. Chou, and C.-W. Huang, “Radio Resource Schedulingfor 5G NR via Deep Deterministic Policy Gradient,” in2019 IEEE InternationalConference on Communications Workshops (ICC Workshops), Shanghai, China,May 2019, pp. 1–6.
[5] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare,A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovskiet al., “Human-level con-trol through deep reinforcement learning,”nature, vol. 518, no. 7540, pp. 529–533,2015.
[6] C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptationof deep networks,”arXiv preprint arXiv:1703.03400, 2017.
[7] J. Snell, K. Swersky, and R. Zemel, “Prototypical networks for few-shot learning,”inAdvances in neural information processing systems, 2017, pp. 4077–4087.
[8] G. Koch, R. Zemel, and R. Salakhutdinov, “Siamese neural networks for one-shotimage recognition,” inICML deep learning workshop, vol. 2. Lille, 2015.
[9] J. X. Wang, Z. Kurth-Nelson, D. Tirumala, H. Soyer, J. Z. Leibo, R. Munos, C. Blun-dell, D. Kumaran, and M. Botvinick, “Learning to reinforcement learn,”arXivpreprint arXiv:1611.05763, 2016.
[10] Y. Duan, J. Schulman, X. Chen, P. L. Bartlett, I. Sutskever, and P. Abbeel,“Rl2: Fast reinforcement learning via slow reinforcement learning,”arXiv preprintarXiv:1611.02779, 2016.
[11] A. Gupta, R. Mendonca, Y. Liu, P. Abbeel, and S. Levine, “Meta-reinforcementlearning of structured exploration strategies,” inAdvances in Neural InformationProcessing Systems, 2018, pp. 5302–5311.
[12] T. Xu, Q. Liu, L. Zhao, and J. Peng, “Learning to explore via meta-policy gradient,”inInternational Conference on Machine Learning, 2018, pp. 5463–5472.
[13] M. Chen, W. Saad, C. Yin, and M. Debbah, “Data correlation-aware resource man-agement in wireless virtual reality (vr): An echo state transfer learning approach,”IEEE Transactions on Communications, vol. 67, no. 6, pp. 4267–4280, 2019.
[14] Y. Zhang, L. Jiao, J. Yan, and X. Lin, “Dynamic service placement for virtual re-ality group gaming on mobile edge cloudlets,”IEEE Journal on Selected Areas inCommunications, vol. 37, no. 8, pp. 1881–1897, 2019.[15] F. Guo, L. Ma, H. Zhang, H. Ji, and X. Li, “Joint load management and resourceallocation in the energy harvesting powered small cell networks with mobile edgecomputing,” inIEEE INFOCOM 2018 - IEEE Conference on Computer Communi-cations Workshops (INFOCOM WKSHPS), 2018, pp. 299–304.
[16] H. Ahmadi, O. Eltobgy, and M. Hefeeda, “Adaptive multicast streaming of virtualreality content to mobile users,” inProceedings of the on Thematic Workshopsof ACM Multimedia 2017, ser. Thematic Workshops ’17.New York, NY, USA:Association for Computing Machinery, 2017, p. 170–178. [Online]. Available:https://doi.org/10.1145/3126686.3126743
[17] J. Yang, J. Luo, D. Meng, and J. Hwang, “Qoe-driven resource allocation optimizedfor uplink delivery of delay-sensitive vr video over cellular network,”IEEE Access,vol. 7, pp. 60 672–60 683, 2019.
[18] X. Yang, Z. Chen, K. Li, Y. Sun, N. Liu, W. Xie, and Y. Zhao, “Communication-constrained mobile edge computing systems for wireless virtual reality: Schedulingand tradeoff,”IEEE Access, vol. 6, pp. 16 665–16 677, 2018.
[19] Y. Mori, N. Fukushima, T. Fujii, and M. Tanimoto, “View generation with 3d warp-ing using depth information for ftv,” in2008 3DTV Conference: The True Vision -Capture, Transmission and Display of 3D Video, May 2008, pp. 229–232.
[20] H. Mao, M. Alizadeh, I. Menache, and S. Kandula, “Resource Management withDeep Reinforcement Learning,” inProceedings of the 15th ACM Workshop on HotTopics in Networks - HotNets ’16. New York, New York, USA: ACM Press, 2016,pp. 50–56.
[21] G. Dulac-Arnold, R. Evans, H. van Hasselt, P. Sunehag, T. Lillicrap, J. Hunt,T. Mann, T. Weber, T. Degris, and B. Coppin, “Deep reinforcement learning in largediscrete action spaces,”arXiv preprint arXiv:1512.07679, 2015.
[22] M.-T. Luong, H. Pham, and C. D. Manning, “Effective approaches to attention-basedneural machine translation,”arXiv preprint arXiv:1508.04025, 2015.
[23] Y.-C. Chen, Y.-T. Lin, and C.-W. Huang, “A hybrid scenario generator and its ap-plication on network simulations,” in2020 IEEE International Conference on Con-sumer Electronics - Taiwan (ICCE-Taiwan) (2020 IEEE ICCE-Taiwan), Taoyuan,Taiwan, Sep. 2020.
[24] Mojang, “Minecraft,” https://www.minecraft.net, 2009.
[25] “Craftbukkit,” https://getbukkit.org, 2009.[26] Xikage,“Mythicmobs,”https://www.spigotmc.org/resources/mythicmobs-free-version-the-1-custom-mob-creator.5702, 2015.
[27] 3GPP TS 23.501, “System Architecture for the 5G System.”

指導教授

黃志煒

審核日期

2020-8-20

推文