在此篇論文中,吾人考慮以強化學習來研究搭載多天線的無人機在一無線充電網路(Wireless Powered Communication Network, WPCN)中的飛行之軌跡、用戶傳輸配置以及功率控制。無人機從充電站(power station)充滿電後出發,飛至裝置附近,接著利用無線電充電傳輸裝置(wireless powered transfer)為下行鏈路的裝置充電,裝置藉由獵取無人機所傳輸出來的射頻訊號從中獲取能量,並通過上行鏈路向無人機傳輸訊息。吾人考慮設計一電量有限的無人機之飛行軌跡、功率控制及用戶連結以最大化無人機所接受到的傳輸量。在此系統模型中,無人機能藉由裝置所回傳的接收能量強度得知通道情況,以符合實際情況。除此之外,為了確保無人機的服務品質,使得各裝置都傳輸資訊量相近,吾人另外設計了一個傳輸資訊量的閥值來限制各裝置的傳輸數量。強化學習是運算非凸性問題的一個強大工具,但它本身存在著維度問題,深度強化學習雖能避免維度問題,但難以解釋其中計算的機制。因此,吾人提出一改良式的強化學習,透過平均鄰近的過往所學習過的狀態值去估測當前狀態的期望數值,藉此簡易的方法來改善因維度過大,難以尋得最佳解的困境。由模擬結果可以證明此使用改良式強化學習能避免維度問題並且相較於常用的非線性估計方法複雜度來的更低。;In this thesis, we investigate the trajectory design, user association, and power control of an unmanned aerial vehicle (UAV) in the wireless power communication network (WPCN). The UAV departs from the power station (PS) with a full battery and flies to the vicinity of each device to charge their battery then collect data from them. The UAV transfers the radio frequency (RF) signal to charge the devices in downlink and the devices harvest energy from the RF signal, then use the harvested energy to transmit data to the UAV in uplink. We jointly consider the trajectory design, power control, and user association of a battery-constrained UAV to maximize the system throughput. In this system model, the UAV can know the channel information through the received signal strength indicator (RSSI) to conform to the real-world situation. Besides, in order to guarantee the quality of service (QoS) so that all devices transmit similar amounts of information, we also design a threshold to limit the amount of the transmitted data from devices. Reinforcement learning (RL) is a strong tool to solve the non-convex problem, however, it suffers from the curse of dimensionality. Despite deep reinforcement learning (DRL) that can circumvent the curse of dimensionality, it is not interpretable. As the result of that, we propose an improved RL which estimates the expected value of the current state by averaging the Q-values of its neighboring states, using a simple method to deal with the dilemma that is too large to find the optimal solution. Numerical results demonstrate that the proposed algorithm can circumvent the curse of dimensionality, and the complexity is lower comparing to the existing non-linear approximation method.