EPG2S：基於電子硬顎圖訊號的語音生成技術;EPG2S: Speech Synthesis Technology Based on Electropalatography Signal

NCU Institutional Repository > 資訊電機學院 > 資訊工程研究所 > 博碩士論文 > Item 987654321/86818

請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/86818

題名:	EPG2S：基於電子硬顎圖訊號的語音生成技術;EPG2S: Speech Synthesis Technology Based on Electropalatography Signal
作者:	陳柏勳;Chen, Po-Hsun
貢獻者:	資訊工程學系
關鍵詞:	多模態;電子硬顎圖;語音合成;語音增強;multimodal;electropalatography;speech synthesis;speech enhancement
日期:	2021-09-27
上傳時間:	2021-12-07 13:16:05 (UTC+8)
出版者:	國立中央大學
摘要:	使用發音的運動資訊合成語音，能為現實應用帶來益處，例如聲帶受損的病患、需要靜音通話的場景，或是在高噪音的環境中。在這項研究中，我們探索了另類數據，即電子硬顎圖 (Electropalatography, EPG)，並提出了一種新穎的多模態 EPG 轉語音 (EPG-to-Speech, EPG2S) 合成系統。我們的模型有兩項目標：(1) 僅使用 EPG 信號合成語音。 (2) 如果我們可以在有噪聲的環境中同時獲得語者的語音信號，我們就可以利用 EPG 信號進行語音增強 (SE)。在 EPG2S 系統中我們研究了兩種融合策略，分別為後期融合 (Late Fusion, LF) 和早期融合 (Early Fusion, EF)。在漢語語料庫上的實驗結果表明，第一個目標中，與加入真實世界噪聲的語音相比，所提出的多模態 EPG2S 系統平均皆優於 SNR 為 -5dB 或更低的背景噪聲。第二個目標中，這些系統在 PESQ、STOI 和 ESTOI 這些語音評估指標中，優於僅使用語音訊號的 SE 系統。這些結果驗證了使用 EPG 信號合成語音的可行性以及將其納入 SE 系統的有效性。;Synthesized speech from articulatory movement can bring benefits to patients with vocal cord disorders, situations requiring silence, or in high-noise environments. In this study, we explore alternative data, namely electropalatography (EPG), and propose a novel multimodal EPG-to-speech (EPG2S) synthesis system. Our model has two goals: (1) Synthesize speech using only EPG signal. (2) If we can obtain the speaker′s audio signal in a noisy environment simultaneously, we can perform speech enhancement (SE) by leveraging the EPG signal. Two fusion strategies are investigated for the EPG2S system, namely late fusion (LF) and early fusion (EF). Experimental results on a Mandarin corpus. In the first goal, compared to speech with real-world noises, the proposed multimodal EPG2S systems outperform background noise at an SNR level of -5dB or lower on average. In the second goal, these systems outperform the audio-only SE counterparts in PESQ, STOI, and ESTOI speech evaluation metrics. These results verify the feasibility of using EPG signals to synthesize speech and the effectiveness of incorporating it into the SE system.
顯示於類別:	[資訊工程研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	106	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....