English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 80990/80990 (100%)
造訪人次 : 42683922      線上人數 : 1523
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋


    請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/93010


    題名: 基於Transformer架構之繁體中文場景文字辨識系統;Traditional Chinese Scene Text Recognition based on Transformer Architecture
    作者: 蔡維庭;Tsai, Wei-Ting
    貢獻者: 資訊工程學系在職專班
    關鍵詞: 繁體中文辨識;Transformer架構;場景文字辨識;Traditional Chinese recognition;Transformer Architecture;Scene text recognition
    日期: 2023-06-27
    上傳時間: 2024-09-19 16:38:24 (UTC+8)
    出版者: 國立中央大學
    摘要: 在繁體中文場景的文字辨識任務中,系統須同時具備處理圖像和文字兩種模態的能力。由於繁體中文的字符結構複雜、字元數量眾多,為了確保文字能夠準確辨識,辨識模型和系統架構設計往往變得複雜,而且通常需要大量計算資源。為了讓硬體資源有限的邊緣設備能運作即時繁體中文辨識,本研究提出一個能動態調整架構的辨識系統。此系統由一個辨識與校正子系統所組成,辨識子系統包含輕量化辨識模型SVTR,校正子系統主要為雙向克漏字語言模型,兩者分別基於Transformer編碼器與解碼器架構而設計,透過注意力機制與多重下採樣運算讓輸出特徵能關注不同尺度的資訊,局部特徵關注字符結構與筆劃,全局特徵關注字元之間的語義資訊。因此模型架構能簡化,從而減少參數量。在訓練階段,我們將模型的梯度傳遞過程分離,以確保模型能夠獨立運作。在運行階段,系統根據不同規模的硬體環境調整配置,將參數量較少的辨識子系統運行於硬體資源有限的機器上,而讓包含校正子系統的完整系統佈署於有較高計算資源的伺服器上。從實驗中可得知,辨識子系統的參數大小只有11.45(MB),準確率可達到 71%。結合校正子系統後,準確率則可提升至77%。;In the task of text recognition in Traditional Chinese scenarios, the system needs to possess the ability to process both image and text modalities simultaneously. Given the complex character structure and extensive character set in Traditional Chinese, ensuring accurate text recognition necessitates complex design of recognition models and system architectures, often demanding significant computational resources. To enable real-time Traditional Chinese recognition on edge devices with limited hardware resources, this research proposes a recognition system with a dynamically adjustable architecture. The system consists of a recognition and a correction subsystems. The recognition subsystem incorporates a lightweight recognition model called SVTR, while the correction subsystem includes a bidirectional cloze language model. Both subsystems are designed based on the Transformer encoder-decoder architecture. Through attention mechanisms and multiple down-sampling operations, the output features are able to focus on information at different scales. Local features attend to character structure and strokes, while global features emphasize semantic information between characters. Consequently, the model architecture can be simplified, leading to a reduction in the number of parameters. During the training phase, we separate the gradient propagation process of the model to ensure its independent operation. In the inference phase, the system adjusts its configuration based on the scale of the hardware environment. The recognition subsystem, which has fewer parameters, runs on hardware-limited machines, while the main system incorporating the correction subsystem is deployed on servers with higher computational resources. Experimental results indicate that the parameter size of the recognition subsystem is a mere 11.45 MB, achieving an accuracy of 71%. Upon integration with the correction subsystem, the accuracy improves to 77%.
    顯示於類別:[資訊工程學系碩士在職專班 ] 博碩士論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    index.html0KbHTML21檢視/開啟


    在NCUIR中所有的資料項目都受到原著作權保護.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明