基於弱監督式學習之自然場景文字字元分割;Character Segmentation in Scene-Text Images Based on Weakly Supervised Learning

NCU Institutional Repository > 資訊電機學院 > 資訊工程研究所 > 博碩士論文 > Item 987654321/93258

請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/93258

題名:	基於弱監督式學習之自然場景文字字元分割;Character Segmentation in Scene-Text Images Based on Weakly Supervised Learning
作者:	陳莉筑;Chen, Li-Zhu
貢獻者:	資訊工程學系
關鍵詞:	深度學習;語意分割;任意走向文字定位;弱監督式學習;Deep learning;semantic segmentation;arbitrary orientations text localization;weakly supervised learning
日期:	2023-07-25
上傳時間:	2024-09-19 16:50:55 (UTC+8)
出版者:	國立中央大學
摘要:	近年來基於深度學習於自然場景文字檢測的相關研究盛行，普遍以偵測字詞(word)為主要目標，並取得不錯的效果。然而，文字字體型態多變，且待測影像背景趨於複雜，文字可能受到遮蔽物阻擋，特別是當自然場景文字走向多元時，準確的字詞偵測並不容易達成，也影響下一階段文字辨識的準確度。本研究提出像素級字元(character)偵測網路，透過偵測字元的方式嘗試解決不規則走向字詞不易偵測的問題。字元偵測能讓偵測框更緊貼文字邊緣，降低複雜背景對於偵測網路所造成的影響，後續的文字辨識或可使用較輕量的辨識網路，減少訓練所需的資源與時間。字元偵測的主要挑戰在於現有自然場景文字檢測資料集皆採用字詞標記，因為針對字元的人工標記相當耗時費力。我們藉由生成大量貼近真實場景的合成資料來解決訓練集缺少字元標記的問題，並結合弱監督式學習在含有字詞標記的真實影像進行模型訓練。對於這些沒有字元標記的真實資料，我們以迭代更新結果的方式使網路自動學習偵測更可靠的字元位置，提升模型表現。另外，因應缺少字元標記的測試資料，我們提出新的字元偵測評估方式。實驗結果顯示我們的方法在ICDAR2017、TotalText和CTW-1500資料集上皆優於其他字元偵測模型，我們也將同樣的方式運用於訓練中文字元偵測以驗證所提出方法在其他語言內容的可行性。;In recent years, there has been a prevailing trend in deep learning-based research for natural scene-text detection. The primary focus has generally been on word-level detection, which has yielded promising results. However, text fonts have significant variations, and the backgrounds of test images tend to be complex. Text may also be obstructed by occlusions, particularly in cases where natural scene text exhibits diverse orientations. Achieving accurate word-level detection under such circumstances is challenging and can also impact the subsequent text recognition accuracy. To address the difficulty of detecting irregularly oriented words, this paper proposes a pixel-level character detection network. By detecting individual characters, the detection boxes can adhere more closely to the text boundaries, reducing the negative influence of complex backgrounds on the detection network. Lighter-weight recognition networks can thus be employed for subsequent text recognition, reducing the resource and time requirements for training. The main challenge in character detection lies in the fact that existing natural scene-text detection datasets focus on word-level annotations, since character-level annotation is a laborious and time-consuming task. To overcome this challenge, we generate a large volume of synthetic data that closely resembles real-world scenarios. We employ partially annotated data for training, incorporating weakly supervised learning techniques and the inclusion of real-world data during training. For real-world data without character-level annotations, we adopt an iterative update approach to automatically learn more reliable character positions through the use of updated results to improve the accuracy of the model. Additionally, we propose a new evaluation method for character detection to address the lack of character-level annotated test datasets. Experimental results demonstrate the superiority of our method over other character detection models on the ICDAR2017, TotalText, and CTW-1500 datasets. We also apply the same approach to train models for character detection in other languages to validate the feasibility of the proposed method.
顯示於類別:	[資訊工程研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	19	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....