單 通 道 語 音 分 離 的 深 度 學 習;Deep Learning for Single-Channel Speech Separation

NCU Institutional Repository > 資訊電機學院 > 資訊工程研究所 > 博碩士論文 > Item 987654321/92992

請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/92992

題名:	單通道語音分離的深度學習;Deep Learning for Single-Channel Speech Separation
作者:	何銘津;Tan, Ha Minh
貢獻者:	資訊工程學系
關鍵詞:	深度學習;單通道聲學分解;判別向量學習;時域音頻分離;輕量級網絡;Deep learning;single-channel acoustic decomposition;lightweight network;time domain audio separation;discrimination-vector learning
日期:	2023-02-03
上傳時間:	2024-09-19 16:37:22 (UTC+8)
出版者:	國立中央大學
摘要:	本論文利用深度神經網路 (DNN) 來解決單通道語音分離問題，我們採用了三種不同的方法。首先，我們使用基於 frequency-to-time Domain 的單通道源分離。在這個領域中，基於嵌入向量的模型獲得突破性的成功，例如深度聚類。我們參考深度聚類的想法，提出了新的框架，即 Encoder Squash-norm Deep Clustering（ESDC）。相比於當前的方法，包括深度聚類、深度提取網路（DENet）、深度吸引子網絡（DANet）和幾種更新版本的深度聚類，結果表明，我們提出的框架顯著降低了單通道聲學分解的性能消耗。其次，我們提出了一個基於雙路徑回歸神經網路(DPRNN)的 inter-segment 和 intra-segment 的時域單通道聲學分解。這個架構在模擬超長序列的表現上具有頂尖的性能。而我們引入了一種新的選擇性相互學習法(SML)，在 SML 方法中，有兩個 DPRNN 互相交換知識並且互相學習，特別的是，剩餘的網路由高可信度預測引導的同時，忽略低可信度的預測。根據實驗結果，選擇性相互學習法(SML)大大優於其他類似的方法，如獨立訓練、知識蒸餾和使用相同模型設計的相互學習。最後，我們提出一個輕量但高性能的語音分離網路: SeliNet。 SeliNet 是採用瓶頸模塊和空洞時間金字塔池的一維卷積架構神經網路。實驗結果表明，SeliNet 在僅需少量浮點運算量和較少模型參數的同時，獲得了最先進(SOTA)的性能。;This dissertation addresses the issues of single-channel speech separation by exploiting deep neural networks (DNNs). We approach three different directions. First, we approach single-channel source separation based on the frequency-to-time domain. In this domain, ground-breaking successful models based on the embedding vector which is presented such as deep clustering. We develop our framework inspired by deep clustering, namely node encoder Squash norm deep clustering (ESDC). The results have shown that our proposed framework significantly reduces the performance of single-channel acoustic decomposition in comparison to current training techniques including deep clustering, deep extractor network (DENet), deep attractor network (DANet), and several updated versions of deep clustering. Second, we proposed monaural acoustic decomposition based on the time domain. An impressive contribution of the inter-segment and the intra-segment architectures of the dual-path recurrent neural network (DPRNN), this architecture has cutting-edge performance and can simulate exceedingly long sequences. We introduce a new selective mutual learning. In the selective mutual learning (SML) approach, there are two DPRNNs. They exchange knowledge and learn from one another. In particular, the remaining network is guided by the high-confidence forecasts, meanwhile, the low-confidence predictions are disregarded. According to the experimental findings, selective mutual learning greatly outperforms other training methods such as independent training, knowledge distillation, and mutual learning using the same model design. Finally, we introduce a lightweight yet effective network for speech separation, namely SeliNet. The SeliNet is the one-dimensional convolutional architecture that employs bottleneck modules, and atrous temporal pyramid pooling. The experimental results have shown that the suggested SeliNet obtains state-of-the-art (SOTA) performance while still maintaining the small number of floating-point operations (FLOPs) and model size.
顯示於類別:	[資訊工程研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	23	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....