中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/95716
English  |  正體中文  |  简体中文  |  Items with full text/Total items : 80990/80990 (100%)
Visitors : 42695589      Online Users : 1435
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version


    Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/95716


    Title: 深度學習應用於麥克風陣列單聲源追蹤系統;The application of deep learning in microphone array single-source tracking systems
    Authors: 彭冠銘;PENG, Kuan-Ming
    Contributors: 電機工程學系
    Keywords: 麥克風陣列;聲援追蹤
    Date: 2024-07-25
    Issue Date: 2024-10-09 17:11:35 (UTC+8)
    Publisher: 國立中央大學
    Abstract: 自從新冠病毒的疫情之後,遠端視訊會議之需求急劇上升,而各 式遠端視訊之產品的需求也不斷上升,而隨著科技的發展各式輔助 會議進行的產品也不斷推陳出新,使遠端會議更有效率。而在會議進 行不論是要進行錄影來記錄會議內容或需要進行遠端會議,往往需要 確認發言者是否在鏡頭拍攝範圍之內,會使會議效率降低。而若可將 聲源追蹤系統應用於現代會議場景,將可提升會議的品質與效率。故本篇研究運用麥克風陣列搭配攝影機建構適合運用於會議場景之聲源追蹤裝置,使用 Python 搭配真實錄製的 LOCATA 資料集針對定位準確度、所需計算時間與體積小巧等條件進行各種麥克風陣列幾何之 分析,最終選擇正八面排列之體麥克風陣列。結合最小能量無失真響應 (Minimum Power Distortionless Response, MPDR)、轉向功率相位轉換 (Steered Response Power Phase Transform, SRP-PHAT)、多重訊號分類法 (Mulitiple Signal Classification, MUSIC) 三種過去常見的聲源定位演算法,以及使用深度學習強化在具有回響與高雜訊場景下仍保有不錯性能的 Cross3D、IcoDOA 與 Neural-SRP 三種聲源定位演算法。研究中針對室內回響與噪音兩種不利於聲源定位之條件進行模擬分析,以及實時聲源追蹤需要演算法計算要夠快在追蹤任務中才不會造成延遲。而 IcoDOA 演 算法與 Neural-SRP 演算法在訊噪比 SNR = 5dB ~ 30dB 的環境下定位誤差 均在 10 度之內,兩種演算法在回響 RT60 = 0.2s ~ 1s 的環境下定位誤差 也都在 10 度之內,但每幀計算時間就以 IcoDOA 演算法最好,平均計算一幀只需 2.067 毫秒。因此最終使用正八面體之麥克風陣列搭配 IcoDOA 演算法,在模擬實際會議狀況的情境中使用單一聲源並播放語音訊號之場景下,可使得聲源有 91.11 % 的時間落在鏡頭內。而若是在模擬實際會議狀況的情境中播放音樂聲源,可使得聲源有 87.77 % 的時間落在鏡頭內。;Since the outbreak of the COVID-19 pandemic, the demand for remote video conferencing has surged, driving up the need for various remote video products. With technological advancements, numerous auxiliary products have been con- tinuously introduced to enhance the efficiency of remote meetings. One com- mon issue during meetings is ensuring that the speaker is within the camera’s frame, which can lower meeting efficiency when recording the meeting content or conducting remote conferences. Applying a sound source tracking system to modern meeting scenarios can improve the quality and efficiency of meetings.
    This study utilizes a microphone array paired with a camera to construct a sound source tracking device suitable for meeting scenarios. By using Python and the LOCATA dataset, recorded in real-life conditions, various microphone array geometries were analyzed based on criteria such as localization accu- racy, computational time, and compactness. The final choice was an octahe- dral microphone array. This array combines three commonly used sound source localization algorithms—Minimum Power Distortionless Response (MPDR), Steered Response Power Phase Transform (SRP-PHAT), and Multiple Signal Classification (MUSIC)—with three deep learning-enhanced localization algo- rithms that maintain good performance in echoic and noisy environments: Cross3D, IcoDOA, and Neural-SRP.
    The study simulates and analyzes the conditions of indoor reverberation and noise, which are unfavorable for sound source localization. It also considers the need for fast algorithmic computations to prevent delays in real-time sound source tracking. Both the IcoDOA and Neural-SRP algorithms demonstrated localization errors within 10 degrees in environments with signal-to-noise ratios (SNR) ranging from 5dB to 30dB and reverberation times (RT60) from 0.2s to 1s. However, IcoDOA showed the best performance in terms of computation time per frame, averaging only 2.067 milliseconds per frame.
    Therefore, by ultimately using an octahedral microphone array with the Ico- DOA algorithm, the sound source can be kept within the camera’s field of view 91.11 % of the time in a simulated real meeting scenario with a single sound source playing a speech signal. In a simulated real meeting scenario playing a music source, the sound source can be kept within the camera’s field of view 87.77 % of the time.
    Appears in Collections:[Graduate Institute of Electrical Engineering] Electronic Thesis & Dissertation

    Files in This Item:

    File Description SizeFormat
    index.html0KbHTML51View/Open


    All items in NCUIR are protected by copyright, with all rights reserved.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明