在本篇論文中,基於多輸入輸出系統,我們採用混合式波束成型的架構來減少運算複雜度及硬體成本,並且利用深度強化學習來進行波束追蹤。在已知通道狀態資訊的前提下,我們使用了深度強化學習當中的深度確定性梯度下降算法(Deep Deterministic Policy Gradient)來取得類比預編碼器,將通道容量作為代理人採取動作後,從環境得到的反饋及獎勵,透過其獎勵的變化及收斂趨勢來判斷訓練是否成功,同時為了符合相移器(phase shifter)的大小限制,我們在深度確定性梯度下降算法裡動作策略網路(actor network)的輸出層增加正規化功能。我們利用深度強化學習其對環境擁有高度的容忍及適應力同樣在時變通道(time-varying channels)中進行波束追蹤來取得類比預編碼器,並觀察其效能表現。通道容量為我們評斷其效能的標準,因此我們將自己所提出的深度確定性梯度下降混和式波束合成演算法(DDPG for Hybrid Precoder Algorithm)與另外兩種傳統演算法包含全數位演算法(Fully-Digital Algorithm)和傳統基於奇異值分解之單使用者混和式波束合成演算法(Single-User Hybrid Precoder Algorithm)所得到的通道容量作比較,可以發現在非時變通道及時變通道下,透過100組通道的模擬及測試,深度確定性梯度下降混和式波束合成演算法的平均效能優於傳統基於奇異值分解之單使用者混和式波束合成演算法,更靠近擁有最佳效能的全數位演算法。;In this thesis, hybrid beamforming architecture is adopted to reduce the computational complexity and hardware cost in multiple-input multiple-output (MIMO) system and deep reinforcement learning (DRL) is employed for beam tracking. With channel state information (CSI), we use deep deterministic policy gradient (DDPG) algorithm to compute analog precoder. Channel capacity is regarded as the feedback reward from the environment when the agent takes the action corresponding to the values for the analog precoder. To satisfy the magnitude constraint of phase shifters, we propose to add normalization function in the output layer of actor network in DDPG, which shows good convergence. Furthermore, beam tracking capability of DDPG is also examined in time-varying channels by exploiting the adaptability toward environment from DRL. From the simulation results, we can see that given the proper adjustment of the variance in Ornstein-Uhlenbeck random process, the average performance of DDPG for hybrid precoder algorithm is better than the conventional single-user hybrid precoder algorithm and approaches to fully-digital algorithm under time-invariant and time-varying channels.