探討以聽覺生理為基礎和以深度學習為基礎之人工電子耳聲音編碼策略;Investigations of Cochlear Implant Sound Coding Strategies Based on Auditory Physiology and Deep Learning

NCU Institutional Repository > 資訊電機學院 > 電機工程研究所 > 博碩士論文 > Item 987654321/93358

請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/93358

題名:	探討以聽覺生理為基礎和以深度學習為基礎之人工電子耳聲音編碼策略;Investigations of Cochlear Implant Sound Coding Strategies Based on Auditory Physiology and Deep Learning
作者:	黃心和;Huang, Enoch Hsin-Ho
貢獻者:	電機工程學系
關鍵詞:	人工電子耳;聲音編碼策略;聽覺生理;深度學習;語音理解度;cochlear implant;sound coding strategy;auditory physiology;deep learning;speech intelligibility
日期:	2023-07-26
上傳時間:	2024-09-19 16:55:45 (UTC+8)
出版者:	國立中央大學
摘要:	本論文是關於人工電子耳(Cochlear Implant, CI，又稱為人工耳蝸)聲音編碼策略的研究成果，其中探索了以聽覺生理和深度學習為基礎的編碼策略之原理與機制，並模擬這些策略在中文語音理解度(Speech Intelligibility)方面的表現。聲音編碼策略的重要功能是負責將關鍵語音資訊轉換為大腦可理解的神經脈衝形式，讓經過壓縮的電刺激訊號得以通過電神經瓶頸(Electroneural Bottleneck)。目前電子耳聆聽仍有其限制，故編碼策略的改良頗為重要。本研究將聽覺生理知識及人工智慧技術，分別應用於電子耳編碼策略的改良。在聽覺生理的探討中，選出三個以聽覺生理為基礎的編碼策略：生物助聽器(Biologically Inspired Hearing Aid, BioAid)、包絡增強(Envelope Enhancement, EE)、基本頻率調變(Fundamental Frequency Modulation, F0mod)，將三者與目前最廣泛使用的進階組合編碼(Advanced Combination Encoder, ACE)策略整合，成為四個不同的單獨性編碼策略(Singular Coding Strategy)，且進而提出了四種所衍生而成的組合性編碼策略(Combinational Coding Strategy)，再進行獨特的比較性研究(Comparative Study)。在深度學習的研究中，有別於傳統的編碼策略和機器學習前處理，我們直接以深度學習開發的編碼策略ElectrodeNet。此研究除了對於深度神經網路(Deep Neural Network, DNN)、卷積神經網路(Convolutional Neural Network, CNN)、長短期記憶網路(Long Short-Term Memory, LSTM)的架構進行效果評估，也針對多種不同的實驗條件進行比較，更提出了涵蓋頻道選擇(Channel Selection, CS)功能的改良版ElectrodeNet-CS策略。本研究採用聲碼器合成電子耳模擬語音，除了進行客觀評估，並在NCU-CI實驗平台上進行正常聽力個案的中文句子聽力測驗。在聽覺生理的研究結果中，當訊噪比在5 dB以上時，EE策略在短時客觀理解度(Short-Term Objective Intelligibility, STOI)和聽力實驗的平均分數稍微高於ACE策略，而在組合性編碼策略中， EE功能的開啟也可以改善其他編碼策略的語音理解度。在深度學習部分，當ElectrodeNet策略採用DNN、CNN和LSTM的網路架構時，和ACE策略在STOI和正規化共變異數測量(Normalized Covariance Metric, NCM)的分數上呈現了高度的相關性。在不同語言的訓練語料和噪音環境下，ElectrodeNet和ACE策略亦具備密切的關連。此外，更進階的ElectrodeNet-CS策略，甚至在STOI分數上稍微超越ACE的表現。本研究依照聽覺生理提出了組合性編碼策略及獨特的比較性研究，並發展出以深度學習為處理核心的聲音編碼策略，其成果證實了所提出方法的可行性，亦可對相關領域提供一些啟發。 ;This dissertation presents the research outcomes on cochlear implant (CI) sound coding strategies. This study explores the principles and mechanisms of cochlear implant (CI) coding strategies based on auditory physiology and deep learning, and simulates the performance of these strategies in Mandarin speech intelligibility. The coding strategy plays a crucial role in encoding and converting the key speech information into neural impulse patterns that the auditory brain can recognize, so that the compressed electrical stimuli can pass through the limited electroneural bottleneck. With the current limitations in CI listening, the improvement of the sound coding strategy is of great importance. This study applies relevant knowledge and technology in auditory physiology and artificial intelligence (AI) to the innovation of the CI coding strategy. In the investigation of auditory physiology, three coding strategies based on auditory physiology, including the biologically inspired hearing aid (BioAid), envelope enhancement (EE), and fundamental frequency modulation (F0mod), are selected and integrated with the widely used advanced combination encoder (ACE) strategy. With the four singular coding strategies, it is proposed to derive four combinational coding strategies, and a comparative study was conducted for them. In the investigation of deep learning, unlike traditional coding strategies and machine-learning-based preprocessing, this study introduces ElectrodeNet, a coding strategy developed directly using deep learning. The performance of ElectrodeNet is evaluated for the architectures of deep neural network (DNN), convolutional neural network (CNN), and long short-term memory (LSTM). Various experimental factors were compared. Furthermore, an improved coding strategy containing the channel selection (CS) function, ElectrodeNet-CS, is also proposed. In the outcomes of the investigation of auditory physiology, the EE strategy achieved average scores in short-term objective intelligibility (STOI) and listening experiments slightly higher than those for ACE at signal-to-noise ratios (SNRs) of 5 dB or above. In combinational coding strategies, the activation of the EE function also slightly improved the speech comprehension of the other coding strategies. In the investigation of deep learning, the ElectrodeNets based on the DNN, CNN, and LSTM architectures demonstrated high correlations with the ACE strategy in terms of STOI and the normalized covariance metric (NCM) scores. With training datasets of different languages and conditions of different noise types, strong relationships were also revealed between ElectrodeNet and ACE. Furthermore, the more advanced strategy of ElectrodeNet-CS even surpasses ACE slightly in STOI scores. This research conducts a unique comparative study and proposes the combinational coding strategies based on auditory physiology, and develops coding strategies based on deep learning. The research outcomes not only demonstrate the feasibility of the proposed approaches but also offer valuable insights into related fields.
顯示於類別:	[電機工程研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	21	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....