中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/86324
English  |  正體中文  |  简体中文  |  Items with full text/Total items : 78852/78852 (100%)
Visitors : 35324856      Online Users : 286
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version


    Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/86324


    Title: 具有注意力門之卷積遞迴神經網路於實時單通道語音增強;Convolutional Recurrent Neural Network With Attention Gates For Real-time Single-channel Speech Enhancement
    Authors: 吳文宇;Wu, Wen-Yu
    Contributors: 通訊工程學系
    Keywords: 深度學習;實時語音增強;卷積遞迴神經網路;Deep Learning;Real-time Speech Enhancement;Convolutional Recurrent Neural Network
    Date: 2021-07-16
    Issue Date: 2021-12-07 12:32:46 (UTC+8)
    Publisher: 國立中央大學
    Abstract: 現今室內或室外環境中,到處存在噪音,這不僅影響語音品質,也影響自動語音辨識。因此,在產品開發上,我們需考慮實時語音增強性能,例如:智慧音箱。傳統語音增強算法對於平穩狀態的噪音,例如:空調聲,具有良好降噪效果。然而,對於非平穩狀態的噪音,例如:風聲,其降噪效果有限。由於,現今深度學習技術盛行,語音增強受益於深度學習,可以有效處理非平穩狀態的噪音。
    本論文提出的方法為以具有注意力門 (Attention Gates, AG) 之卷積遞迴神經網路 (Convolutional Recurrent Neural Network, CRNN) 模型,來實現語音增強。由於模型結合卷積神經網路 (Convolutional Neural Network, CNN) 的優點,例如:強大的特徵提取,添加注意力門以增強重要特徵,抑制不相關部分,以及長短期記憶網路 (Long Short-Term Memory Network, LSTM) 的優點,例如:時間序列動態建模。因此,模型能夠有效地估計出複數比例遮罩 (Complex Ratio Mask, CRM),從而獲得更好的語音品質。由於,提出之模型參數量只有2.3M,計算複雜度低,因此可達到實時語音增強目的。;In today′s indoor or outdoor environment, noises exist everywhere, which not only affect the speech quality but also affect automatic speech recognition. Therefore, in product development, we need to consider the performance of real-time speech enhancement, such as smart speakers. Traditional speech enhancement algorithms have good noise reduction effects for stationary noises, such as air conditioner noises. However, for non-stationary noises, such as wind noises, its noise reduction effects are limited. Due to the popularity of deep learning technology, speech enhancement benefits from deep learning, which can effectively deal with non-stationary noises.
    The method proposed in this paper is to use the convolutional recurrent neural network model with attention gates, to achieve speech enhancement. Because the model combines the advantages of the convolutional neural network, such as powerful feature extraction, adding attention gates to enhance important features and suppress irrelevant parts, and the advantages of the long short-term memory network, such as time series dynamic modeling. Therefore, the model can effectively estimate the complex ratio mask, to obtain better speech quality. Since the parameters of the proposed model are only 2.3M, the computational complexity is low, the objective of real-time speech enhancement can be achieved.
    Appears in Collections:[Graduate Institute of Communication Engineering] Electronic Thesis & Dissertation

    Files in This Item:

    File Description SizeFormat
    index.html0KbHTML116View/Open


    All items in NCUIR are protected by copyright, with all rights reserved.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明