中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/93255
English  |  正體中文  |  简体中文  |  全文笔数/总笔数 : 80990/80990 (100%)
造访人次 : 42700933      在线人数 : 1472
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜寻范围 查询小技巧:
  • 您可在西文检索词汇前后加上"双引号",以获取较精准的检索结果
  • 若欲以作者姓名搜寻,建议至进阶搜寻限定作者字段,可获得较完整数据
  • 进阶搜寻


    jsp.display-item.identifier=請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/93255


    题名: 基於事後近端策略優化的深度強化學習機械手臂控制;Hindsight Proximal Policy Optimization based Deep Reinforcement Learning Manipulator Control
    作者: 蘇聖哲;Su, Sheng-Che
    贡献者: 資訊工程學系
    关键词: 機械手臂;深度強化學習;事後近端策略優化;機器人控制
    日期: 2023-07-25
    上传时间: 2024-09-19 16:50:48 (UTC+8)
    出版者: 國立中央大學
    摘要: 現今工廠智慧自動化的需求日漸增加,傳統的機械手臂在工廠中執行簡單自動化模式工作,深度強化學習則能夠能讓機械手臂執行更複雜的工作。在機器人領域的深度強化學習經常要面對困難的學習任務,在三維且連續的環境中,使得機器人難以獲得獎勵,這種環境稱為稀疏獎勵環境。為了克服此一問題,本研究提出了基於事後近端策略優化的深度強化學習(HPPO,Hindsight Proximal Policy Optimization)方法,用於機械手臂智慧控制。該方法結合了PPO(Proximal Policy Optimization)算法和HER(Hindsight Experience Replay)的想法,提升PPO在稀疏獎勵環境的適應性和樣本使用率。不同於傳統強化學習架構,我們採用Multi-goal概念,使Agent在與環境互動時有明確的目標,並且參考HER算法中的假資料生成,使Agent能夠從失敗中學習,進而更快達成目標。我們在機械手臂控制的模擬環境中進行了一系列實驗,並與其他深度強化學習進行比較,實驗結果表明以PPO作為核心算法改良的HPPO效果有顯著的提升,HPPO在稀疏獎勵環境中適應性較佳,並提高了樣本使用率,使訓練效率提升,驗證了HPPO使用於機械手臂的實用性,並且能以此方法為基礎應用於多種機器人的控制應用。;The demand for intelligent automation in factories has been increasing, with traditional manipulator performing simple automation tasks. Deep reinforcement learning enables manipulator to handle more complex tasks. However, the field of robotics faces challenging learning tasks, particularly in sparse reward environments, where robots struggle to obtain rewards. To overcome this issue, this study proposes a method called Hindsight Proximal Policy Optimization (HPPO) based on proximal policy optimization (PPO) and the idea of Hindsight Experience Replay (HER) for intelligent control of robotic arms. HPPO combines the PPO algorithm with the concept of multi-goal reinforcement learning, providing the agent with explicit goals during interactions with the environment. Additionally, it leverages the generation of fictitious data from HER to enable the agent to learn from failures and achieve goals more efficiently. A series of experiments were conducted in a simulated environment for robotic arm control, comparing HPPO with other deep reinforcement learning methods. The results demonstrate significant improvements in HPPO, which exhibits better adaptability and sample utilization in sparse reward environments. The training efficiency is enhanced, validating the practicality of HPPO for robotic arm control and its potential application in various robot control scenarios.
    显示于类别:[資訊工程研究所] 博碩士論文

    文件中的档案:

    档案 描述 大小格式浏览次数
    index.html0KbHTML14检视/开启


    在NCUIR中所有的数据项都受到原著作权保护.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明