English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 78852/78852 (100%)
造訪人次 : 35247855      線上人數 : 703
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋


    請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/83789


    題名: 基於注意力殘差網路之繁體中文街景文字辨識;Traditional Chinese Scene Text Recognition based on Attention-Residual Network
    作者: 蘇冠宇;Su, Kung-Yu
    貢獻者: 軟體工程研究所
    關鍵詞: 電腦視覺;深度學習;街景文字偵測;繁體中文字辨識;scene text recognition;scene text detection;synthetic data
    日期: 2020-07-29
    上傳時間: 2020-09-02 17:06:28 (UTC+8)
    出版者: 國立中央大學
    摘要: 街景招牌文字經常傳達豐富的資訊,若能經由視覺技術辨識這些影像中的文字將有利於許多相關應用的開發。儘管電腦視覺於光學文本辨識已有相當成熟的技術,但自然場景文字辨識仍是非常具有挑戰性的任務。除了更多樣的字體、文字大小、與使用者拍攝角度等因素外,繁體中文字訓練資料目前仍不多見,眾多中文字也很難平均地蒐集相對應的照片,即使蒐集了足夠資料也會面臨數據不平衡問題。因此,本研究使用數種繁體中文字體產生高品質訓練影像及標記資料,模擬街景上複雜的文字變化,同時避免人工標記可能造成的誤差。除此之外,本文中亦探討如何使人工生成繁體文字影像更貼近街景真實文字,透過調整光線明亮度、幾何轉換、增加外框輪廓等方式產生多樣化訓練資料以增強模型的可靠性。對於文字偵測及辨識,我們採用兩階段演算法。首先我們採用Deep Lab模型以語意分割方式偵測街景中的單字與文本行所在區域,接著使用STN (Spatial Transformer Network) 修正偵測階段所框列的傾斜文字以利後續辨識階段的特徵提取。我們改良了ResNet50 模型,透過注意力機制改善模型在大型分類任務中的準確率。最後,我們透過使用者的GPS資訊與Google Place API中的地點資訊進行交叉比對,藉此驗證與修正模型輸出文字,增強街景文字的辨識能力。實驗結果顯示本研究能有效偵測及辨識繁體中文街景文字,並在複雜街景測試下表現優於Line OCR及Google Vision。;Texts in nature scenes, especially street views, usually contain rich information related to the images. Although recognition of scanned documents has been well studied, scene text recognition is still a challenging task due to variable text fonts, inconsistent lighting conditions, different text orientations, background noises, angle of camera shooting and possible image distortions. This research aims at developing effective Traditional Chinese recognition scheme for streetscape based on deep learning techniques. It should be noted that constructing a suitable training dataset is an essential step and will affect the recognition performance significantly. However, the large alphabet size of Chinese characters is certainly an issue, which may cause the so-called data imbalance problem when collecting corresponding images. In the proposed scheme, a synthetic dataset with automatic labeling is constructed using several fonts and data augmentation. In an investigated image, the potential regions of characters and text-lines are located. For the located single characters, the possibly skewed images are rectified by the spatial transform network to enhance the performance. Next, the proposed attention-residual network improves the recognition accuracy in this large-scale classification. Finally, the recognized characters are combined using detected text-lines and corrected by the information from Google Place API with the location information. The experimental results show that the proposed scheme can correctly extract the texts from the selected areas in investigated images. The recognition performance is superior to Line OCR and Google Vision in complex street scenes.
    顯示於類別:[軟體工程研究所 ] 博碩士論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    index.html0KbHTML249檢視/開啟


    在NCUIR中所有的資料項目都受到原著作權保護.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明