English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 80990/80990 (100%)
造訪人次 : 42687734      線上人數 : 1440
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋


    請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/95635


    題名: 混合式SRAM-RRAM記憶體內運算架構: 高效可靠的深度學習推理解決方案;Hybrid SRAM-RRAM Computing-In-Memory Architecture: A Solution for Efficient and Reliable Deep Learning Inference
    作者: 劉致瑋;Liu, Zhi-Wei
    貢獻者: 電機工程學系
    關鍵詞: 記憶體;Memory
    日期: 2024-07-17
    上傳時間: 2024-10-09 17:06:54 (UTC+8)
    出版者: 國立中央大學
    摘要: 目前,馮紐曼體系結構(von Neumann architecture,VNA)是電腦系統的基本結構,由中央處理器(CPU)和存儲器(Memory)組成,它們通過數據通道和控制信號連接。CPU 負責執行存儲在存儲器中的指令,而存儲器則用於存儲指令和數據。然而,對於影像辨別分類、語音分類和自然語言處理等數據密集型應用,會在記憶體和計算核心之間傳輸大量數據,導致馮紐曼瓶頸的產生,其原因是指在這種結構下存儲器和CPU之間的通信速度限制,導致 CPU 需要等待存儲器響應,限制了系統的整體性能。
    為了解決馮紐曼瓶頸,人們將注意力轉向記憶體內運算(Computing In-Memory,CIM),認為這是一個有潛力的解決方案。這種方法將計算功能移到存儲器中,使得計算和數據處理可以在同一地方進行,因此減少了CPU和存儲器之間的通信需求,來提高系統的效率和性能。許多研究人員提出了不同的 CIM 架構來加速 AI 運算。 廣義上,CIM 運算可分為兩種類型:類比計算和數位計算。 近年來,類比CIM因其在高並行性和能源效率層面保有高度優勢而受到大眾的廣泛關注。 因此,我們本篇論文的重點是類比 CIM 架構。在各種記憶體類型中,SRAM (Static Random-Access Memory) 和 RRAM (Resistive Random-Access Memory) 脫穎而出,成為流行的選擇。
    基於 SRAM 的 CIM 架構因為其技術成熟且穩定已成功證明了具有成熟設備製程的高效可靠的運算。 然而,SRAM單元有相對較大的單元面積和較低的儲存密度導致晶片面積需求增加。 相反,基於 RRAM 的 CIM 架構具有高密度、低功耗、非揮發性以及與 CMOS製程無縫整合等優勢。然而,他們面臨著與工藝良率差異相關的挑戰,導致各種類型的故障。雖然這兩種 CIM 架構都能顯著提高運算速度,但它們都有各自的優點和缺點。
    為了最大限度地發揮不同 CIM 架構的優勢,我們提出了一種新穎的混合SRAM-RRAM CIM 架構,能直接將儲存於記憶體陣列中的權重直接就地執行運算,透過專門設計的外圍電路整合了SRAM 和RRAM 結構。此外,我們引入了一種新穎的權重分配策略,即權重儲存策略(Weight Storage Strategy,WSS),該策略根據最高有效位(Most Significant Bits,MSBs) 和最低有效位(Least Significant Bits,LSBs) 各自的重要性適當地分配它們於不同的記憶體陣列中,權重的最高有效位對於計算的影響較高,所以我們會將它儲存於相對穩定的SRAM陣列中,而最低有效位通常位元數較多且相對不重要,所以將它儲存於面積較小的RRAM陣列中。最終實驗結果表明,我們的架構在面積、洩漏功率和能耗方面分別超越了基於 8T-SRAM 的 CIM 架構約35%、40%以及50%,同時在可靠性方面在使用MNIST 與手部辨識資料集進行評估也優於基於 RRAM 的架構約32%與18%。
    ;Currently, the von Neumann architecture (VNA) is the fundamental structure of computer systems, consisting of a Central Processing Unit (CPU) and Memory, connected by data channels and control signals. The CPU executes instructions stored in memory, while memory is used to store instructions and data. However, for data-intensive applications such as image classification, speech recognition, and natural language processing, large amounts of data are transferred between memory and computing cores, leading to the emergence of von Neumann bottlenecks. This is due to the communication speed limitation between the CPU and memory in this structure, causing the CPU to wait for memory responses, thereby limiting the overall system performance.
    To address the von Neumann bottleneck, attention has shifted towards Computing In-Memory (CIM), seen as a promising solution. This approach moves computational functions into memory, allowing computation and data processing to occur in the same place, thereby reducing the communication demands between the CPU and memory to improve system efficiency and performance. Many researchers have proposed various CIM architectures to accelerate AI computation. Broadly, CIM computation can be divided into two types: analog computing and digital computing. In recent years, analog CIM has received widespread attention due to its inherent advantages in high parallelism and energy efficiency. Therefore, the focus of our work is on analog CIM architectures. Among various types of memory, SRAM (Static Random-Access Memory) and RRAM (Resistive Random-Access Memory) stand out as popular choices.
    SRAM-based CIM architectures have proven successful due to their mature and stable technology, demonstrating efficient and reliable computation with mature device processes. However, the relatively larger unit area and lower storage density of SRAM cells lead to increased chip area requirements. In contrast, CIM architecture based on RRAM offers advantages such as high density, low power consumption, non-volatility, and seamless integration with CMOS processes. However, they face challenges related to process yield differences, resulting in various types of faults. While both CIM architectures significantly improve computational speed, they each have their own advantages and disadvantages.
    To fully leverage the advantages of different CIM architectures, we propose a novel hybrid SRAM-RRAM CIM architecture that enables direct in-place computation of weights stored in the memory array. This is achieved through a specially designed peripheral circuit integrating SRAM and RRAM structures. Additionally, we introduce a novel weight allocation strategy, termed the Weight Storage Strategy (WSS), which appropriately distributes weights based on the importance of their Most Significant Bits (MSBs) and Least Significant Bits (LSBs) into different memory arrays. The MSBs of weights have a greater impact on computations, so we store them in the relatively stable SRAM array, while the LSBs, which typically have more bits and are relatively less critical, are stored in the smaller RRAM array. Ultimately, experimental results demonstrate that our architecture surpasses 8T-SRAM-based CIM architectures by approximately 35%, 40%, and 50% in terms of area, leakage power, and energy consumption, respectively. At the same time, in terms of reliability, it is also better than the RRAM-based architecture by about 32% and 18% when evaluated using MNIST and hand detection datasets.
    顯示於類別:[電機工程研究所] 博碩士論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    index.html0KbHTML26檢視/開啟


    在NCUIR中所有的資料項目都受到原著作權保護.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明