具有注意力門之卷積遞迴神經網路於實時單通道語音增強;Convolutional Recurrent Neural Network With Attention Gates For Real-time Single-channel Speech Enhancement

NCU Institutional Repository > 資訊電機學院 > 通訊工程研究所 > 博碩士論文 > Item 987654321/86324

請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/86324

題名:	具有注意力門之卷積遞迴神經網路於實時單通道語音增強;Convolutional Recurrent Neural Network With Attention Gates For Real-time Single-channel Speech Enhancement
作者:	吳文宇;Wu, Wen-Yu
貢獻者:	通訊工程學系
關鍵詞:	深度學習;實時語音增強;卷積遞迴神經網路;Deep Learning;Real-time Speech Enhancement;Convolutional Recurrent Neural Network
日期:	2021-07-16
上傳時間:	2021-12-07 12:32:46 (UTC+8)
出版者:	國立中央大學
摘要:	現今室內或室外環境中，到處存在噪音，這不僅影響語音品質，也影響自動語音辨識。因此，在產品開發上，我們需考慮實時語音增強性能，例如:智慧音箱。傳統語音增強算法對於平穩狀態的噪音，例如:空調聲，具有良好降噪效果。然而，對於非平穩狀態的噪音，例如:風聲，其降噪效果有限。由於，現今深度學習技術盛行，語音增強受益於深度學習，可以有效處理非平穩狀態的噪音。本論文提出的方法為以具有注意力門 (Attention Gates, AG) 之卷積遞迴神經網路 (Convolutional Recurrent Neural Network, CRNN) 模型，來實現語音增強。由於模型結合卷積神經網路 (Convolutional Neural Network, CNN) 的優點，例如:強大的特徵提取，添加注意力門以增強重要特徵，抑制不相關部分，以及長短期記憶網路 (Long Short-Term Memory Network, LSTM) 的優點，例如:時間序列動態建模。因此，模型能夠有效地估計出複數比例遮罩 (Complex Ratio Mask, CRM)，從而獲得更好的語音品質。由於，提出之模型參數量只有2.3M，計算複雜度低，因此可達到實時語音增強目的。;In today′s indoor or outdoor environment, noises exist everywhere, which not only affect the speech quality but also affect automatic speech recognition. Therefore, in product development, we need to consider the performance of real-time speech enhancement, such as smart speakers. Traditional speech enhancement algorithms have good noise reduction effects for stationary noises, such as air conditioner noises. However, for non-stationary noises, such as wind noises, its noise reduction effects are limited. Due to the popularity of deep learning technology, speech enhancement benefits from deep learning, which can effectively deal with non-stationary noises. The method proposed in this paper is to use the convolutional recurrent neural network model with attention gates, to achieve speech enhancement. Because the model combines the advantages of the convolutional neural network, such as powerful feature extraction, adding attention gates to enhance important features and suppress irrelevant parts, and the advantages of the long short-term memory network, such as time series dynamic modeling. Therefore, the model can effectively estimate the complex ratio mask, to obtain better speech quality. Since the parameters of the proposed model are only 2.3M, the computational complexity is low, the objective of real-time speech enhancement can be achieved.
顯示於類別:	[通訊工程研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	82	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....