基於多代理人強化學習方法多架無人機自主追蹤之研究

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：34

、訪客IP：3.142.97.219

姓名

張登凱(Deng-Kai Chang) 查詢紙本館藏

畢業系所

通訊工程學系

論文名稱

基於多代理人強化學習方法多架無人機自主追蹤之研究
(Multi-agent Reinforcement Learning for Autonomous Tracking Using a Swarm of UAVs)

相關論文

★ 連網無人機路徑規劃與基地台連線策略之共同設計：使用模仿增強的深度強化學習方法	★ 網路編碼於多架無人機網路以抵禦機器學習攻擊之研究
★ 使用多代理人強化學習於無線快取網路設計空中基地台三維路徑之研究	★ 空中無線感測：在無人機抖動環境下基於RSS的非接觸式人體偵測與定位
★ 協作式自編碼器嵌入最佳化方法於無人機群網路以接收訊號強度輔助定位之研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

在本論文中，我們目標為無人機群設計一種自主追蹤系統，以定位配戴射頻傳感器之移動目標。在此系統中，配戴全向性接收訊號強度傳感器之無人機可以在給定的追蹤精度下協同合作搜索目標。為了在高度動態的通道環境中實現快速追蹤與定位，我們將無人機飛行決策問題表示為受約束馬可夫決策過程，其主要目的為避免執行冗餘的飛行決策。緊接著，我們提出一種增強的多代理人強化學習，以協調多台無人機執行實時目標追蹤任務。所提出之框架的核心是一個反饋控制系統，此系統同時考慮了通道估計的不確定性。此外，我們證明了該演算法可以收斂至最優決策。最後，我們通過建置高動態通道環境並生成人工數據來評估所提框架與演算法之性能。根據模擬結果與嚴格的數學證明，本論文所提之框架可以在有限的時間內完成追蹤與定位之任務。此外，結果更表明所提出之系統框架的可行性，與傳統強化學習方法相比，可以減少30%-50%之搜索時間，並提高20%的任務完成率。

摘要(英)

In this thesis, we aim to design an autonomous tracking system for a swarm of unmanned aerial vehicles (UAVs) to localize a radio frequency (RF) mobile target.
In this system, each UAV equipped with omnidirectional received signal strength (RSS) sensor can cooperatively search the target with a specified accuracy.
However, to achieve rapid tracking and localization in the highly dynamic channel environment (e.g., time-varting transmit power and intermittent signal), we formulate the UAV flight decision problem as a constrained Markov decision process.
The main objective is to avoid redundant UAV flight decisions.
Then, we propose an enhanced multi-agent reinforcement learning to perform multiple UAVs real-time tracking missions in cooperation.
The core of the proposed scheme is a feedback control system that takes into account the uncertainty of the channel estimate.
Also, we prove the proposed algorithm can converge to the optimal decision.
Finally, our simulation results show that the proposed scheme outperforms traditional reinforcement learning algorithms (i.e., Q-learning, multi-agent Q-learning) in terms of searching time and successful localization probability by 30% to 50% and 20%, respectively.

關鍵字(中)

★ 多代理人強化學習
★ 無人機
★ 追蹤與定位
★ 受限制馬可夫決策過程

關鍵字(英)

★ Multi-agent Reinforcement learning
★ Unmanned aerial vehi cles (UAVs)
★ Localization and tracking
★ Constrained Markov decision pro cess

論文目次

論文摘要..................................................................................................... i
Abstract....................................................................................................... ii
謝誌............................................................................................................. iv
目錄............................................................................................................. v
圖目錄......................................................................................................... vi
表目錄.........................................................................................................viii
一、緒論..................................................................................... 1
1.1 研究背景 . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 研究動機與目的 . . . . . . . . . . . . . . . . . . . . 2
1.3 論文架構 . . . . . . . . . . . . . . . . . . . . . . . . 4
二、文獻探討............................................................................. 5
2.1 基於濾波器方法之無人機追蹤與定位 . . . . . . . . . 5
2.2 基於機器學習方法之無人機追蹤與定位 . . . . . . . 6
2.3 綜合觀點 . . . . . . . . . . . . . . . . . . . . . . . . 7
三、系統模型............................................................................. 9
3.1 無人機軌跡模型 . . . . . . . . . . . . . . . . . . . . 9
3.2 空對地通道模型 . . . . . . . . . . . . . . . . . . . . 10
3.3 問題闡述 . . . . . . . . . . . . . . . . . . . . . . . . 12
v
3.3.1 馬可夫決策過程模型 . . . . . . . . . . . . . . . . . . 12
3.3.2 受限制馬可夫決策過程模型 . . . . . . . . . . . . . . 15
四、單台無人機追蹤與定位.....................................................17
4.1 單台無人機自主追蹤與定位問題 . . . . . . . . . . . 18
4.2 單代理Q學習（Single-agent Q-learning） . . . . . . . 18
4.3 基於Q學習之單台無人機自主追蹤與定位限制 . . . . 20
五、多台無人機追蹤與定位.....................................................23
5.1 多台無人機自主追蹤與定位問題 . . . . . . . . . . . 23
5.2 多代理Q學習（Multi-agent Q-learning） . . . . . . . 24
5.3 基於多代理Q學習之多台無人機自主追蹤與定位限制 25
六、聯合多台無人機自主追蹤與定位.....................................27
6.1 聯合多台無人機自主追蹤與定位問題 . . . . . . . . . 28
6.2 基本概念 . . . . . . . . . . . . . . . . . . . . . . . . 28
6.3 基於高斯過程迴歸角度約束之集中式學習 . . . . . . 29
6.4 策略決策機制 . . . . . . . . . . . . . . . . . . . . . . 31
6.5 強化學習演算法之分析 . . . . . . . . . . . . . . . . . 37
七、模擬結果與分析.................................................................40
7.1 模擬設置 . . . . . . . . . . . . . . . . . . . . . . . . 40
7.2 模擬性能結果 . . . . . . . . . . . . . . . . . . . . . . 42
八、結論與貢獻.........................................................................56
參考文獻.....................................................................................................57
附錄一.........................................................................................................65
附錄二.........................................................................................................66
vi
附錄三.........................................................................................................69

參考文獻

[1] B. Zhang, C. H. Liu, J. Tang, Z. Xu, J. Ma, and W. Wang, “Learning based energy-efﬁcient data collection by unmanned vehicles in smart cities,” IEEE Transactions on Industrial Informatics, vol. 14, no. 4, pp.
1666–1676, 2018.
[2] X. C. Chen and Y. J. Chen, “A machine learning based attack in UAV communication networks,” IEEE Vehicular Technology Confer ence (VTC Fall), 2019.
[3] H. Huang and A. V. Savkin, “A method for optimized deployment of unmanned aerial vehicles for maximum coverage and minimum inter ference in cellular networks,” IEEE Transactions on Industrial Infor matics, vol. 15, no. 5, pp. 2638–2647, 2019.
[4] G. J. NUNNS, Y. J. Chen, D. K. Chang, K. M. Liao, F. P. Tso, and L. Cui, “Autonomous ﬂying wiﬁ access point,” in IEEE Symposium on Computers and Communications (ISCC), 2019.
[5] Y.J.Chen,K.M.Liao,M.L.Ku,andF.P.Tso,“Mobility-aware probabilistic caching in UAV-assisted wireless D2D networks,” IEEE Global Communication Conference (GLOBECOM), 2019.
[6] W. Fawaz, C. Abou-Rjeily, and C. Assi, “UAV-aided cooperation for FSO communication systems,” IEEE Communications Magazine, vol. 56, no. 1, pp. 70–75, 2018.
[7] Z.Kashino,G.Nejat,andB.Benhabib,“Multi-UAV based autonomous
wilderness search and rescue using target iso-probability curves,” in International Conference on Unmanned Aircraft Systems (ICUAS), 2019.
[8] H. Huang and A. V. Savkin, “An algorithm of reactive collision free 3D deployment of networked unmanned aerial vehicles for surveillance and monitoring,” IEEE Transactions on Industrial Informatics, vol. 16, no. 1, pp. 132–140, 2020.
[9] Y. J. Chen and L. C. Wang, “Privacy protection for Internet of drones: A network coding approach,” IEEE Internet of Things Journal, vol. 6, no. 2, pp. 1719–1730, 2019.
[10] Y. Tang, Y. Hu, J. Cui, F. Liao, M. Lao, F. Lin, and R. S. H. Teo, “Vision-aided multi-UAV autonomous ﬂocking in GPS-denied environment,” IEEE Transactions on Industrial Electronics, vol. 66, no. 1, pp. 616–626, 2019.
[11] Y. Liu, Q. Wang, H. Hu, and Y. He, “A novel real-time moving target tracking and path planning system for a quadrotor UAV in unknown unstructured outdoor scenes,” IEEETransactionsonSystems, Man, and Cybernetics: Systems, vol. 49, no. 11, pp. 2362–2372, 2019.
[12] J. Zhao, F. Gao, L. Kuang, Q. Wu, and W. Jia, “Channel tracking with ﬂight control system for UAV mmwave mimo communications,” IEEE Communications Letters, vol. 22, no. 6, pp. 1224–1227, 2018.
[13] W. Afzal and A. A. Masoud, “Harmonic potential based
communication-aware navigation and beamforming in cluttered spaces with full channel-state information,” in IEEE International Conference on Robotics and Automation (ICRA), 2017.
[14] W.Xie,L.Wang,B.Bai,B.Peng,andZ.Feng,“An improved algorithm based on particle ﬁlter for 3D UAV target tracking,” in IEEE International Conference on Communications (ICC), 2019.
[15] K. Li, R. C. Voicu, S. S. Kanhere, W. Ni, and E. Tovar, “Energy efﬁcient legitimate wireless surveillance of UAV communications,” IEEE Transactions on Vehicular Technology, vol. 68, no. 3, pp. 2283–2293, 2019.
[16] F. Koohifar, I. Guvenc, and M. L. Sichitiu, “Autonomous tracking of intermittent RF source using a UAV swarm,” IEEE Access, vol. 6, pp. 15884–15897, 2018.
[17] M. Erdelj, E. Natalizio, K. R. Chowdhury, and I. F. Akyildiz, “Help from the sky: Leveraging UAVs for disaster management,” IEEE Pervasive Computing, vol. 16, no. 1, pp. 24–32, 2017.
[18] Y. Y. Shih, A. C. Pang, and P. C. Hsiu, “A Doppler effect based framework for Wi-Fi signal tracking in search and rescue operations,” IEEE Transactions on Vehicular Technology, vol. 67, no. 5, pp. 3924–3936,
2018.
[19] G. Fokin, “Passive geolocation with unmanned aerial vehicles using AOA measurement processing,” in International Conference on Advanced Communication Technology (ICACT), 2020.
[20] J. Dong, M. Mukadam, F. Dellaert, and B. Boots, “Motion planning as probabilistic inference using Gaussian processes and factor graphs.” in Robotics: Science and Systems, vol. 12, 2016.
[21] L. Petrovi´c, J. Perši´c, M. Seder, and I. Markovi´c, “Stochastic optimization for trajectory planning with heteroscedastic Gaussian processes,” in European Conference on Mobile Robots (ECMR), 2019.
[22] Y. Li, H. Li, Z. Li, H. Fang, A. K. Sanyal, Y. Wang, and Q. Qiu, “Fast and accurate trajectory tracking for unmanned aerial vehicles based on deep reinforcement learning,” in IEEE 25th International Conference on Embedded and Real-Time Computing Systems and Applications
(RTCSA), 2019.
[23] N. Imanberdiyev, C. Fu, E. Kayacan, and I. M. Chen, “Autonomous navigation of UAV by using real-time model-based reinforcement learning,” in 14th International Conference on Control, Automation, Robotics and Vision (ICARCV), 2016.
[24] M.M.U.Chowdhury,F.Erden,andI.Guvenc,“RSS-basedQ-learning for indoor UAV navigation,” Proc. IEEE Conf. on Military Commun. (MILCOM), Nov 2019.
[25] Q. Wang, H. Liu, K. Gao, and L. Zhang, “Improved multi-agent reinforcement learning for path planning-based crowd simulation,” IEEE Access, vol. 7, pp. 73841–73855, 2019.
[26] J. Zhao, F. Gao, Q. Wu, S. Jin, Y. Wu, and W. Jia, “Beam tracking for UAV mounted satcom on-the-move with massive antenna array,” IEEE Journal on Selected Areas in Communications, vol. 36, no. 2, pp. 363–375, 2018.
[27] M. Effati and K. Skonieczny, “EKF and UKF localization of a moving RF ground target using a ﬂying vehicle,” in IEEE 30th Canadian Conference on Electrical and Computer Engineering (CCECE), 2017.
[28] M. Ezuma, F. Erden, C. K. Anjinappa, O. Ozdemir, and I. Guvenc, “Micro-UAV detection and classiﬁcation from RF ﬁngerprints using machine learning techniques,” in IEEE Aerospace Conference, 2019.
[29] Z. Lin, H. H. T. Liu, and M. Wotton, “Kalman ﬁlter-based large-scale wildﬁre monitoring with a system of UAVs,” IEEE Transactions on Industrial Electronics, vol. 66, no. 1, pp. 606–615, 2019.
[30] F.Koohifar,A.Kumbhar,andI.Guvenc,“Recedinghorizon multi-UAV cooperative tracking of moving RF source,”IEEE Communications Letters, vol. 21, no. 6, pp. 1433–1436, 2017.
[31] A. Torabi, M. W. Shafer, G. S. Vega, and K. M. Rothfus, “UAV-RT: An SDR based aerial platform for wildlife tracking,” in IEEE 88th Vehicular Technology Conference (VTC-Fall), 2018.
[32] L. R. G. Carrillo and K. G. Vamvoudakis, “Deep-learning tracking for autonomous ﬂying systems under adversarial inputs,” IEEE Transactions on Aerospace and Electronic Systems, 2019.
[33] S. Hung and S. N. Givigi, “A Q-learning approach to ﬂocking with UAVs ina stochastic environment,” IEEE Transactions on Cybernetics, vol. 47, no. 1, pp. 186–197, 2017.
[34] S. Wu, “Illegal radio station localization with UAV-based Q-learning,” China Communications, vol. 15, no. 12, pp. 122–131, 2018.
[35] X. Liu, Y. Liu, Y. Chen, and L. Hanzo, “Trajectory design and power control for multi-UAV assisted wireless networks: A machine learning approach,” IEEE Transactions on Vehicular Technology, vol. 68, no. 8, pp. 7957–7969, 2019.
[36] M. Hasanzade, O. Herekoglu, N. K. Ure, E. Koyuncu, R. Yeniceri, and G.Inalhan,“Localization and tracking of RF emitting targets with multiple unmanned aerial vehicles in large scale environments with uncertain transmitter power,” in International Conference on Unmanned Aircraft Systems (ICUAS), 2017.
[37] J. H. Bae, Y. S. Kim, N. Hur, and H. M. Kim, “Study on air-to-ground multipath channel and mobility inﬂuences in UAV based broadcasting,” in International Conference on Information and Communication Technology Convergence (ICTC), 2018.
[38] D. Ebrahimi, S. Sharafeddine, P. Ho, and C. Assi, “Autonomous UAV trajectory for localizing ground objects: A reinforcement learning approach,” IEEE Transactions on Mobile Computing, Early access, 2020.
[39] 3GPP Release 15. (Jan. 2018), “Technical Speciﬁcation Group Radio Access Network; Study on Enhanced LTE Support for Aerial Vehicles.” [Online]. Available: http://www.3gpp.org/ftp//Specs/archive/36_series/36.777
[40] Qixing Feng, J. McGeehan, E. K. Tameh, and A. R. Nix, “Path loss models for air-to-ground radio channels in urban environments,” in IEEE 63rd Vehicular Technology Conference, vol. 6, 2006.
[41] Y.Lai,Y.L.Che,S.Luo,andK.Wu, Optimal wireless information and energy transmissions for UAV-enabled cognitive communication systems,” in IEEE International Conference on Communication Systems (ICCS), 2018.
[42] M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, 1994.
[43] M.Bowling,“Multi-agent learning in the presence of agents with limitations,” Dept. Comput. Sci., Carnegie Mellon Univ., Ph.D. dissertation, 2003.
[44] G. Lan, Y. Bu, J. Liang, and Q. Hao, “Action synchronization between human and UAV robotic arms for remote operation,” in IEEE International Conference on Mechatronics and Automation, 2016.
[45] J.C. de Albuquerque, S. C.deLucena, and C.A.V.Campos, “Evaluating data communications in natural disaster scenarios using opportunistic networks with unmanned aerial vehicles,”inIEEE19thInternational Conference on Intelligent Transportation Systems (ITSC), 2016.
[46] W. Koch, R. Mancuso, R. West, and A. Bestavros, “Reinforcement learning for UAV attitude control,” ACM Transactions on Cyber Physical Systems, vol. 3, no. 2, p. 22, 2019.
[47] V. Iyer, V. Talla, B. Kellogg, S. Gollakota, and J. Smith, “Inter technology backscatter: Towards Internet connectivity for implanted devices,” in Proceedings of the ACM SIGCOMM Conference, 2016.
[48] H. Kim and H. Ahn, “Convergence of multiagent Q-learning: Multi action replay process approach,” in IEEE International Symposium on Intelligent Control, 2010.

指導教授

陳昱嘉

審核日期

2020-7-23

推文