English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 78818/78818 (100%)
造訪人次 : 35015818      線上人數 : 402
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋


    請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/86324


    題名: 具有注意力門之卷積遞迴神經網路於實時單通道語音增強;Convolutional Recurrent Neural Network With Attention Gates For Real-time Single-channel Speech Enhancement
    作者: 吳文宇;Wu, Wen-Yu
    貢獻者: 通訊工程學系
    關鍵詞: 深度學習;實時語音增強;卷積遞迴神經網路;Deep Learning;Real-time Speech Enhancement;Convolutional Recurrent Neural Network
    日期: 2021-07-16
    上傳時間: 2021-12-07 12:32:46 (UTC+8)
    出版者: 國立中央大學
    摘要: 現今室內或室外環境中,到處存在噪音,這不僅影響語音品質,也影響自動語音辨識。因此,在產品開發上,我們需考慮實時語音增強性能,例如:智慧音箱。傳統語音增強算法對於平穩狀態的噪音,例如:空調聲,具有良好降噪效果。然而,對於非平穩狀態的噪音,例如:風聲,其降噪效果有限。由於,現今深度學習技術盛行,語音增強受益於深度學習,可以有效處理非平穩狀態的噪音。
    本論文提出的方法為以具有注意力門 (Attention Gates, AG) 之卷積遞迴神經網路 (Convolutional Recurrent Neural Network, CRNN) 模型,來實現語音增強。由於模型結合卷積神經網路 (Convolutional Neural Network, CNN) 的優點,例如:強大的特徵提取,添加注意力門以增強重要特徵,抑制不相關部分,以及長短期記憶網路 (Long Short-Term Memory Network, LSTM) 的優點,例如:時間序列動態建模。因此,模型能夠有效地估計出複數比例遮罩 (Complex Ratio Mask, CRM),從而獲得更好的語音品質。由於,提出之模型參數量只有2.3M,計算複雜度低,因此可達到實時語音增強目的。;In today′s indoor or outdoor environment, noises exist everywhere, which not only affect the speech quality but also affect automatic speech recognition. Therefore, in product development, we need to consider the performance of real-time speech enhancement, such as smart speakers. Traditional speech enhancement algorithms have good noise reduction effects for stationary noises, such as air conditioner noises. However, for non-stationary noises, such as wind noises, its noise reduction effects are limited. Due to the popularity of deep learning technology, speech enhancement benefits from deep learning, which can effectively deal with non-stationary noises.
    The method proposed in this paper is to use the convolutional recurrent neural network model with attention gates, to achieve speech enhancement. Because the model combines the advantages of the convolutional neural network, such as powerful feature extraction, adding attention gates to enhance important features and suppress irrelevant parts, and the advantages of the long short-term memory network, such as time series dynamic modeling. Therefore, the model can effectively estimate the complex ratio mask, to obtain better speech quality. Since the parameters of the proposed model are only 2.3M, the computational complexity is low, the objective of real-time speech enhancement can be achieved.
    顯示於類別:[通訊工程研究所] 博碩士論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    index.html0KbHTML106檢視/開啟


    在NCUIR中所有的資料項目都受到原著作權保護.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明