RS-NAS:基於策略與價值含獎勵重構之 強化學習於網路結構搜索;RS-NAS: A Policy and Value-Based Reinforcement Learning with Reward Shaping on Neural Architecture Search

NCU Institutional Repository > 資訊電機學院 > 資訊工程研究所 > 博碩士論文 > Item 987654321/93172

請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/93172

題名:	RS-NAS:基於策略與價值含獎勵重構之強化學習於網路結構搜索;RS-NAS: A Policy and Value-Based Reinforcement Learning with Reward Shaping on Neural Architecture Search
作者:	張慕平;ZHANG, MU-PING
貢獻者:	資訊工程學系
關鍵詞:	強化學習;稀疏獎勵;網路結構搜索;Reinforcement Learning;Sparse Reward;Neural Architecture Search
日期:	2023-07-24
上傳時間:	2024-09-19 16:45:43 (UTC+8)
出版者:	國立中央大學
摘要:	十年來，隨著硬體效能的增加，深度學習成為了熱門的研究對象，其中在電腦視覺中，卷積神經網路（Convolutional Neural Networks , CNNs）是廣為人知的技術。研究過程中，人們發現複雜的網路模型往往能獲得更高的準確率，但在一些資源有限的終端設備上，複雜模型所帶來的龐大資源消耗，大幅度地限制了CNN的使用。因此近幾年，許多研究都專注於網路結構搜索（Neural Architecture Search, NAS）領域：根據不同目標來自動設計網路模型的技術，而在NAS領域中，我們根據優化方法的不同，將其分成三個類別：強化學習（Reinforcement Learning, RL）、進化演算法（Evolutionary Algorithms, EA）、可微分優化（Differentiable Optimization）。本篇論文針對強化學習方法的 NAS任務上提出一種新的獎勵重構（Reward Shaping）機制，我們稱為RS-NAS，目的是解決強化學習在NAS搜索過程中，會遭遇的稀疏獎勵挑戰，強化學習中的代理人（Agent）無法在搜索過程中獲得獎勵，只能根據最後一步搜索出的模型架構來取得獎勵，這樣使代理人無法評估搜索過程中的每一步優劣，從而降低整體的搜索效能。我們使用兩種強化學習演算法來實作RS-NAS，一種是基於策略（Policy-Based）的近端策略優化（Proximal Policy Optimization, PPO）；另一種是基於價值（Value-Based）的深度Q網路（Deep Q Network, DQN）。同時為了降低搜索成本與變因，讓不同方法盡量在同一標準上比較，本篇論文中我們使用NATS。當作我們的搜索空間，相較於NATS原本的強化學習方法，RS-NAS有更好的搜索性能與穩定性。;Over the past decade, deep learning emerges as a popular research domain with the upgrading of hardware performance. Recently, Convolutional Neural Networks (CNNs) have been admitted as a significant success in computer vision. Moreover, researchers observe that complex network models can often achieve higher accuracy. However, complex models greatly limit the use of CNNs on resource-constrained devices. As a result, many researchers focus their attention on Neural Architecture Search (NAS) recently, which aims at automatically designing network models based on different objectives. Among them, Reinforcement Learning (RL) is a commonly utilized optimization method in NAS. In this thesis. we propose a novel reward shaping mechanism called RS-NAS for designing the RL-based NAS task. The objective is to address the challenge of sparse rewards encountered during the search process in RL. In traditional RL, agents cannot obtain rewards during the search process and can only receive rewards based on the final model architecture obtained. It prevents agents from evaluating the quality of each step in the search process, and hence reduces overall search efficiency. The proposed RS-NAS is implemented using two RL algorithms: Proximal Policy Optimization (PPO) which is a policy-based method, and Deep Q Network (DQN) which is a value-based method. In this thesis, we utilize NATS as the search space to reduce search costs and alleviate the factors undergoing on different methods for fair comparison. Comparing with the original RL methods in NATS, experimental results verify that the proposed RS-NAS demonstrates better search performance and stability.
顯示於類別:	[資訊工程研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	20	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....