由於第五代通訊系統(5G)的發展,增強了網路系統的能力及靈活度,將允許更多極端且嚴峻的應用服務出現在第五代通訊系統上,例如大型多人的虛擬實境線上遊戲。邊緣雲端網路架構期待能有效地用來提升虛擬實境應用。然而,在多人虛擬實境的環境下,使用者的行為會受到其他使用者或是虛擬環境中的物件影響,也因此導致了資源管理的複雜度增加而變得比以往更加困難。在這篇研究中,我們採用了Deep Deterministic Policy Gradient (DDPG) 機器學習演算法來進行資源管理。我們整合了3D資源管理架構並針對機器學習提出了組件化的執行動作,並利用使用者的互動狀態進行分組。 由於現有的機器學習探索策略不適合用在長時間的資源管理上,我們提出了透過meta learning架構的探索策略來強化DDPG演算法。機器學習面臨的另一個挑戰是,當我們改變了輸入資料的維度會導致已經訓練好的模型會陷入無用武之地。我們提出「環境資訊對輸入」的翻譯機,在放入機器學習演算法之前,將環境資訊編碼成輸入,編碼後的輸入資料會擁有固定維度,就能放入已經訓練好的模型之中。 從實驗結果顯示,我們提出的meta DDPG演算法可以達到最高的滿足率,而我們提出的編碼架構雖然會讓表現稍微變差,不過當我們的模型遇到新的環境時,可以不用重新訓練新的模型,能夠直接使用,而這也會是比較有效率的學習方式。;The development of the fifth-generation (5G) system on capability and flexibility enables emerging applications with stringent requirements. Mobile edge cloud (MEC) is expected to be an effective solution to serve virtual reality (VR) applications over wireless networks. In multi-user VR environments, highly dynamic interaction between users increases the difficulty and complexity of radio resource management (RRM). Furthermore, a trained management model is often obsolete when particular key environment parameters are changed. In this thesis, a scalable deep reinforcement learning-based approach is proposed specifically for resource scheduling in the edge network. We integrate a 3D radio resource structure with componentized Markov decision process (MDP) actions to work on user interactivity-based groups. A translator-inspired "information-to-state" encoder is applied to generate a scalable RRM model, which can be reused for environments with various numbers of base stations. Also, a meta-learning-based exploration strategy is introduced to improve the exploration in the deep deterministic policy gradient (DDPG) training process. The result shows that the modified meta exploration strategy improves DDPG significantly. The scalable learning structure with complete model reuse provides comparable performance to individually trained models.