EPG2S：基於電子硬顎圖訊號的語音生成技術;EPG2S: Speech Synthesis Technology Based on Electropalatography Signal

NCUIR > College of Electrical Engineering & Computer Science > Graduate Institute of Computer Science and Information Engineering > Electronic Thesis & Dissertation > Item 987654321/86818

Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/86818

Title:	EPG2S：基於電子硬顎圖訊號的語音生成技術;EPG2S: Speech Synthesis Technology Based on Electropalatography Signal
Authors:	陳柏勳;Chen, Po-Hsun
Contributors:	資訊工程學系
Keywords:	多模態;電子硬顎圖;語音合成;語音增強;multimodal;electropalatography;speech synthesis;speech enhancement
Date:	2021-09-27
Issue Date:	2021-12-07 13:16:05 (UTC+8)
Publisher:	國立中央大學
Abstract:	使用發音的運動資訊合成語音，能為現實應用帶來益處，例如聲帶受損的病患、需要靜音通話的場景，或是在高噪音的環境中。在這項研究中，我們探索了另類數據，即電子硬顎圖 (Electropalatography, EPG)，並提出了一種新穎的多模態 EPG 轉語音 (EPG-to-Speech, EPG2S) 合成系統。我們的模型有兩項目標：(1) 僅使用 EPG 信號合成語音。 (2) 如果我們可以在有噪聲的環境中同時獲得語者的語音信號，我們就可以利用 EPG 信號進行語音增強 (SE)。在 EPG2S 系統中我們研究了兩種融合策略，分別為後期融合 (Late Fusion, LF) 和早期融合 (Early Fusion, EF)。在漢語語料庫上的實驗結果表明，第一個目標中，與加入真實世界噪聲的語音相比，所提出的多模態 EPG2S 系統平均皆優於 SNR 為 -5dB 或更低的背景噪聲。第二個目標中，這些系統在 PESQ、STOI 和 ESTOI 這些語音評估指標中，優於僅使用語音訊號的 SE 系統。這些結果驗證了使用 EPG 信號合成語音的可行性以及將其納入 SE 系統的有效性。;Synthesized speech from articulatory movement can bring benefits to patients with vocal cord disorders, situations requiring silence, or in high-noise environments. In this study, we explore alternative data, namely electropalatography (EPG), and propose a novel multimodal EPG-to-speech (EPG2S) synthesis system. Our model has two goals: (1) Synthesize speech using only EPG signal. (2) If we can obtain the speaker′s audio signal in a noisy environment simultaneously, we can perform speech enhancement (SE) by leveraging the EPG signal. Two fusion strategies are investigated for the EPG2S system, namely late fusion (LF) and early fusion (EF). Experimental results on a Mandarin corpus. In the first goal, compared to speech with real-world noises, the proposed multimodal EPG2S systems outperform background noise at an SNR level of -5dB or lower on average. In the second goal, these systems outperform the audio-only SE counterparts in PESQ, STOI, and ESTOI speech evaluation metrics. These results verify the feasibility of using EPG signals to synthesize speech and the effectiveness of incorporating it into the SE system.
Appears in Collections:	[Graduate Institute of Computer Science and Information Engineering] Electronic Thesis & Dissertation

Files in This Item:

File	Description	Size	Format
index.html		0Kb	HTML	107	View/Open

社群 sharing

Loading...