Using Reinforcement Learning to Support Outbreak Management and Spatiotemporal Analysis of COVID-19 Epidemiology in Japan

NCUIR > college of Health Sciences and Technology > Institute of Biomedical Engineering > Electronic Thesis & Dissertation > Item 987654321/94833

Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/94833

Title:	Using Reinforcement Learning to Support Outbreak Management and Spatiotemporal Analysis of COVID-19 Epidemiology in Japan
Authors:	高雅敏;Kao, Ya-Min
Contributors:	生醫科學與工程學系
Keywords:	新冠肺炎;強化學習;時空分析;階層式分群法;COVID-19;Reinforcement Learning;SEIQR Model;Population-weighted Density;Spatiotemporal Analysis;Hierarchical Clustering
Date:	2024-07-23
Issue Date:	2024-10-09 15:32:51 (UTC+8)
Publisher:	國立中央大學
Abstract:	背景：限制行動能有效減緩COVID-19的傳播，但也造成全球的經濟危機。我們建立的強化學習演算法能在政策和經濟活動之間取得平衡，作為政策訂定之參考依據。我們也利用時空分析，依據日發生率將日本47都道府縣分群，並找出可能的風險因子。方法：我們設計了一個包含四個區域的強化學習環境來代表東京、大阪、沖繩和北海道。這些區域藉由各自的中央車站彼此相連，每個區域擁有自己的SEIQR模型。強化學習的代理人自環境中擷取每日觀察值，給予移動和篩檢兩種動作，並藉由獎勵函數計算所獲得之獎勵，經由這些步驟學習最佳政策。然後我們讓訓練完成的代理人與模擬五波疫情的環境互動，測試最佳策略的績效並分析動作時機點。此外，我們計算47都道府縣發生率時序資料的相關係數，利用階層式分群法將相關係數分群，並使用線性回歸驗證影響發生率與死亡率的風險因子。接著利用地圖呈現各行政區每波疫情之發生率及有效再生數期望值，並以線性回歸確認各波疫情與死亡率之風險因子。結果：訓練完成的代理人能有效壓抑感染人數峰值，並縮短各波疫情天數。代理人通常給予嚴謹的篩檢政策與寬鬆的移動政策，但對於沖繩的篩檢政策也很寬鬆。動作時機分析的結果顯示，在暴露或感染人數快速增加或維持在高點時，代理人會提高對動作的限制，或在暴露或感染人數快速減少或降至區域低點時，放寬篩檢政策。至於沖繩，則是在暴露或感染人數快速增加時，加強篩檢政策。時空分析的結果顯示發生率曲線的相似性與差異性，沖繩和都會區具有較高的感染風險，而東北和中部地區的感染風險較低，其中低緯度和較低的疫苗接種率是重要的風險因子。各波疫情分析則發現早期疫情較為嚴重的地區，在第七波疫情出現疫情轉折點，第八波疫情明顯趨緩，顯示可能已達成群體免疫。結論：強化學習模型極具潛力，可用於協助制訂防疫政策。以彼此相連之SEIQR模型所建立的互動環境，成為動態觀察移動行為的良好工具。時空分析的結果提供有關區域風險的重要資訊與群體免疫之證據，可作為未來防疫政策制定與資源分配之參考依據。 ;Background: Implementing containment measures slowed the spread of COVID-19 but led to a crisis in the world economy. We established a reinforcement learning (RL) algorithm to support disease management by balancing policies and activities. To shed light on lessons learned from the COVID-19 pandemic for future preparedness, we also conducted a spatiotemporal analysis to examine the clustering and risk factors of the 47 prefectures of Japan. Methods: We designed an RL environment with 4 regions that 1) represented Tokyo, Osaka, Okinawa, and Hokkaido, Japan; 2) were connected by each region’s transport hub; and 3) had 4 separate Susceptible-Exposed-Infectious-Quarantined-Recovered (SEIQR) models. The RL agent was trained by obtaining observations from the environment, granting actions of movement, and receiving feedback from the reward function. The trained agent was introduced into environments mimicking the epidemic waves to observe the performance and action timing. In the spatiotemporal analysis, we applied hierarchical clustering on Pearson correlation coefficients of daily incidences to examine the overall similarity and variation among the 47 prefectures of Japan. We used linear regression to identify risk factors. We also demonstrated each prefecture′s incidence, mortality, and expected value of reproduction number for each epidemic wave and verified the risk factors using linear regression. Results: The trained agent flattened the peaks of infectious cases and shortened the epidemics for the 5 epidemics covered in the RL study. The agent was often strict on screening but easy on movement, except for Okinawa, where both actions were generally easy. Action timing analyses indicated that restriction on movement was elevated when the number of exposed or infectious cases remained high or increased rapidly. Stringency on screening was eased when the number of exposed or infectious cases dropped quickly or to a regional low. For Okinawa, action on screening was tightened when the number of exposed or infectious cases increased rapidly. The spatiotemporal analysis demonstrated variations in epidemic patterns, with Okinawa/major metropolitans and Tohoku-Chubu prefectures having relatively higher and lower risks, respectively. Latitude and vaccination were strong discriminants. The comparison among waves also revealed significant deviations and showed signs of achieving herd immunity for early hotspot prefectures. Conclusions: The RL experiments exhibited the potential to assist policy-making and demonstrated how the semi-connected SEIQR models created an interactive environment for imitating moving behaviors. Moreover, findings from the spatiotemporal analysis provide critical information regarding regional risk and can support authorities in future resource allocation.
Appears in Collections:	[Institute of Biomedical Engineering] Electronic Thesis & Dissertation

Files in This Item:

File	Description	Size	Format
index.html		0Kb	HTML	43	View/Open

社群 sharing

Loading...