中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/90040
English  |  正體中文  |  简体中文  |  Items with full text/Total items : 80990/80990 (100%)
Visitors : 42715755      Online Users : 1424
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version


    Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/90040


    Title: 深度學習改善哼唱搜尋音樂系統;Query by Singing/Humming System Improved by Deep Learning
    Authors: 周怡蓁;Yi-Chen, Chou
    Contributors: 電機工程學系
    Keywords: 哼唱搜學音樂系統;深度學習;卷積神經網路;Shazam演算法;Query by singing/humming (QbSH) system;Deep learning;Convolutional neural network;Shazam algorithm
    Date: 2022-08-24
    Issue Date: 2022-10-04 12:08:51 (UTC+8)
    Publisher: 國立中央大學
    Abstract: 音樂是現代人的生活的一部份,隨處都能聽到熟悉的旋律,當腦海中浮現一段不知名卻熟悉的旋律,會透過哼唱的方式模仿這段旋律的音調和節拍,哼唱搜尋音樂系統就此產生。本論文根據提取特徵的來源提出兩個哼唱搜尋音樂系統,分別為Dai-ChouNet27和QBSHNet03,Dai-ChouNet27為參考在環境聲音分類有較佳的表現的DaiNet34的架構所設計出來的哼唱搜尋音樂系統,屬於完全卷積神經網路加上全連接層,含有大尺寸Kernel的卷積層對原始波形進行濾波除噪,再透過多層卷積層直接從原始波形中提取特徵,最後的兩層全連接層完成分類。
    而QBSHNet03為結合Shazam演算法和卷積神經網路(Convolutional Neural Network, CNN)提出的哼唱搜尋音樂系統,透過ConvRBM進行濾波除噪,參考Shazam演算法從聲譜圖(Spectrogram)上提取包含頻率和時間差的特徵,最後以多層卷積層和兩層全連接層對特徵組合完成分類。
    本論文透過MIR-QbSH語料庫、台灣常見之兒歌語料庫和經典英文歌曲語料庫來訓練及測試Dai-ChouNet27、QBSHNet03和DaiNet34,在MIR-QbSH語料庫中,Dai-ChouNet27的表現明顯優於QBSHNet03和DaiNet34,Dai-ChouNet27的訓練準確率/MRR高達99%/0.99,測試準確率/MRR/精確率/召回率最高達到84%/0.88/0.78/0.74,表示從原始波形 提取的特徵較適合哼唱搜尋音樂系統。而在三種語料庫中,透過比較不同片段和噪音程度的訓練/測試結果,Dai-ChouNet27在足夠大的數據集都有傑出的表現,在適合的片段長度和可承受的噪音程度下,訓練/測試的準確率和MRR皆達到84%和0.87以上,且精確率和召回率皆達到0.7以上。
    ;Music is a part of people’s life nowadays. The familiar melodies can be heard everywhere. Sometimes, we would hum the melody which is similar to the unknown but familiar melody appearing in our mind in order to find out the song including that melody. Thus, the query of singing/humming (QbSH) system is developed. According to where the features are extracted from, we propose two QbSH systems, called Dai-ChouNet27 and QBSHNet03. Dai-ChouNet27, designed with reference to the architecture of DaiNet34 which outperforms other models for the environmental sound recognition task, is almost fully convolutional neural network and the last two layers are fully-connected layers. The first layer of Dai-ChouNet27 with large size of kernel is used to filter out the noise in raw waveforms. Several convolutional layers are used to extract high-level features from raw waveforms except the first convolutional layer. Then, the last two layers are fully-connected layers used to classify the features and gain the results.
    QBSHNet03 is a QbSH system that combines Shazam algorithm and convolutional neural network (CNN). In QBSHNet03, the time-domain waveforms are filtered by ConvRBM in order to eliminate the noise in waveform. Features including frequency and time difference are extracted from the spectrograms translated with Short-time Fourier transform (STFT) by Shazam algorithm. After extracting features, several convolutional layers and two fully-connected layers are used to classify the features to obtain the results.
    There are three different datasets used to train and test QBSHNet03, Dai-ChouNet27, and DaiNet34. The three different datasets are MIR-QbSH dataset, dataset of Taiwan’s common children songs, and dataset of classical English songs. In MIR-QbSH dataset, the performance of Dai-ChouNet27 is much better than the performance of QBSHNet03 and DaiNet34. The training accuracy and MRR of Dai-ChouNet27 are up to 99% and 0.99, respectively. Moreover, the testing accuracy, MRR, precision, and recall of Dai-ChouNet27 are up to 84%, 0.88, 0.78, and 0.74, respectively. According to the results, for the QbSH task, the features extracted directly from raw waveforms are more suitable than the features extracted from spectrograms. After comparing the results of different length of clips and variable levels of SNR in the three datasets, Dai-ChouNet27 achieves outstanding performance if the datasets are large enough. If Dai-ChouNet27 is trained and tested with suitable length of clips and the level of SNR that Dai-ChouNet27 can still achieve better performance, the accuracy and MRR of training and testing are up to 84% and 0.87, respectively, moreover, the testing precision/recall are up to 0.7.
    Appears in Collections:[Graduate Institute of Electrical Engineering] Electronic Thesis & Dissertation

    Files in This Item:

    File Description SizeFormat
    index.html0KbHTML30View/Open


    All items in NCUIR are protected by copyright, with all rights reserved.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明