基於動態時間規整結合線性伸縮之哼唱檢索系統

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：23

、訪客IP：18.117.81.240

姓名

李欣成(Hsin-Cheng Lee) 查詢紙本館藏

畢業系所

通訊工程學系

論文名稱

基於動態時間規整結合線性伸縮之哼唱檢索系統

相關論文

★ 基於區域權重之衛星影像超解析技術	★ 延伸曝光曲線線性特性之調適性高動態範圍影像融合演算法
★ 實現於RISC架構之H.264視訊編碼複雜度控制	★ 基於卷積遞迴神經網路之構音異常評估技術
★ 具有元學習分類權重轉移網路生成遮罩於少樣本圖像分割技術	★ 具有注意力機制之隱式表示於影像重建三維人體模型
★ 使用對抗式圖形神經網路之物件偵測張榮	★ 基於弱監督式學習可變形模型之三維人臉重建
★ 以非監督式表徵分離學習之邊緣運算裝置低延遲樂曲中人聲轉換架構	★ 基於序列至序列模型之 FMCW雷達估計人體姿勢
★ 基於多層次注意力機制之單目相機語意場景補全技術	★ 基於時序卷積網路之單FMCW雷達應用於非接觸式即時生命特徵監控
★ 視訊隨選網路上的視訊訊務描述與管理	★ 基於線性預測編碼及音框基頻週期同步之高品質語音變換技術
★ 基於藉語音再取樣萃取共振峰變化之聲調調整技術	★ 即時細緻可調性視訊在無線區域網路下之傳輸效率最佳化研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

隨著智慧型手機以及行動網路的普及，使用音樂串流平台或社群網站搜索、下載音樂資料成為日常生活的一部分。對於腦海中有某首歌的旋律卻想不出歌名、歌詞的情況，內涵式音樂檢索系統(Content Based Music Retrieval, CBMR)如哼唱檢索系統就可直接使用歌曲內涵式特徵如旋律、節奏等做為檢索依據，解決上述問題。
對於較大的檢索資料庫，我們必須在辨識準確率與運算時間中做權衡，先前研究以線性伸縮(Linear Scaling, LS)或推土機距離(Earth Mover’s distance, EMD)等準確度較低但運算速度快的方法先過濾掉不相似歌曲，再以高複雜度的動態時間規整(Dynamic Time Warping, DTW)對剩餘歌曲做高精度比對，最後做相似度融合並輸出前十相似的歌曲清單。本研究將LS用在縮短旋律特徵以及在優化模塊中當作微調特徵長度的工具，前者可以減少運算量後者可以使準確度再上升，高精度的DTW則負責計算匹配距離，在歌曲中移動及伸縮匹配窗口找出對應的旋律起始點以及最佳匹配距離，並設計優化模塊及前置過濾器。實驗結果顯示，本系統的MRR值優於先前的研究，線性縮短資料與優化閥值節省了約40%運算時間。

摘要(英)

With the growing popularity of mobile device and internet service, the amount of music data distributed over the internet are increasing everyday. Music information retrieval systems that have capabilities to search for music accurately and quickly are getting more and more attention.
Sometimes users only remember the melody but forget the lyrics, the Content Based Music Retrieval (CBMR) system like QBSH can solve this problem by using features extracted from the music for searching.
To deal with massive retrieval data, we need to balance accuracy and computation time, previous research combines multiple classifiers using score level fusion to reduce computation time, but poor classifier would lead to poor accuracy. Instead of score level fusion method, our proposed system combines DTW and linear scaling(LS), using LS to shorten query and reference songs in different procedure to reduce computation time and using DTW to compute the similarity between query and reference songs. we also design refinement module and pre-filter to enhance the accuracy.
The experiment results show that our method provide higher MRR compared with previous approach, and we reduce about 40% computation time by scaling down the data and finding best threshold setting.

關鍵字(中)

★ 音樂資訊檢索
★ 哼唱檢索
★ 動態時間規整
★ 線性伸縮

關鍵字(英)

★ Music Retrieval
★ Query by singing and humming
★ Dynamic Time Warping
★ Linear Scaling

論文目次

目　錄
摘　要 I
Abstract II
致謝 III
目　錄 Ⅴ
附圖索引 Ⅶ
附表索引 Ⅸ
第一章緒論 1
1.1　研究背景 1
1.2　研究動機與目的 2
1.3　論文架構 2
第二章音樂資訊檢索 3
2.1　音樂檢索的種類及特性 3
2.1.1 哼唱檢索現況 4
2.2　哼唱檢索框架 5
2.2.1 哼唱檢索相關研究 8
2.2.2 哼唱檢索特徵 9
2.2.3 MIDI檔案格式介紹 9
2.3　特徵提取方法 10
2.3.1 基頻轉半音 14
2.3.2 空間濾波 15
2.3.3 平滑化處裡(Smoothing) 17
2.3.4 匹配效率 19
2.3.5 向量距離 19
第三章旋律特徵匹配演算法 24
3.1　音調平移(Key transposition) 24
3.2　線性伸縮(Linear Scaling, LS) 27
3.3　推土機移動距離(Earth Mover’s Distance) 29
3.3　動態時間規整(Dynamic Time Warping, DTW) 31
第四章　提出之系統架構 38
4.1　前處理(Preprocessing) 39
4.2　匹配階段 41
4.3　匹配流程 42
第五章　實驗結果分析 48
5.1　實驗環境 48
5.1.1 實驗資料集 49
5.1.2 效能評估方法 50
5.2　實驗設計 50
5.2.1 音高追蹤器參數設定 51
5.2.2 優化模塊(Refinement module)閥值選擇 51
5.2.3 前置過濾器(Pre-filter)閥值選擇 53
5.2.4 最終參數設定(Final setting) 54
5.2.5 線性伸縮對運算時間之影響 54
5.2.6 資料集二之檢索數據 55
5.3　實驗結果比較與分析 56
第六章　結論與未來展望 57
參考文獻 59

參考文獻

[1] T. Bertin-Mahieux and D. P. Ellis, "Large-scale cover song recognition using hashed chroma landmarks," in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 201l.
[2] Humphrey, Eric J, Juan P. Bello, and Yann LeCun, "Feature learning and deep architectures: new directions for music informatics", Journal of Intelligent Information Systems, pp. 461-4814. 2013.
[3] W.-H. Tsai and H.-M. Wang, “Automatic singer recognition of popular music recordings via estimation and modeling of solo vocal signals,” IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 1, pp. 330–341, 2006.
[4] M. A. Casey, R. Veltkamp, M. Goto, M. Leman, C. Rhodes, and M. Slaney, “Content-Based Music Information Retrieval: Current Directions and Feature Challenges,” in Proc. of the IEEE, vol. 96 no. 4, pp. 668-696, April 2008.
[5] R.J. McNab, L.A. Smith, I.H. Witten, C.L. Henderson, and S.J. Cunningham, “Towards the Digital Music Library: Tune Retrieval from Acoustic Input,” Proc. First ACM Int’l Conf. Digital Libraries, pp. 11-18, 1996.
[6] J.-S. R. Jang, H.-R. Lee, and M.-Y. Kao, “Content-based music retrieval using linear scaling and branch-and-bound tree search,” in Proc. IEEE Int. Conf. Multimedia and Expo, Tokyo, Japan, Aug. 2001, pp. 289–292.
[7] MIR-QBSH corpus,
http://mirlab.org/dataSet/public/MIR-QBSH-corpus.rar
[8] M. Ryynänen and A. Klapuri, “Query by humming of MIDI and audio using locality sensitive hashing,” in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP), Apr. 2008, pp. 2249–2252.
[9] G. P. Nam, K. R. Park, S.-J. Park, T. T. T. Luong, and H. H. Nam, “Intelligent query by humming system based on score level fusion of multiple classifiers,” EURASIP J. Advances in Signal Process., in submission
[10] G. P. Nam, K. R. Park, S.-J. Park, S.-P. Lee, and M. Y. Kim, “A new query by humming system based on the score level fusion of two classifiers,” Int J Comm Syst 2012.
[11] G. P. Nam and K. R. Park, "Fast query-by-singing/humming system that combines linear scaling and quantized dynamic time warping algorithm,“ IJDSN 2015
[12] G. P. Nam and K. R. Park, "Multi-Classifier Based on a Query-by-Singing/Humming System," Computer Science Symmetry 2015
[13] C. W. Lin, J. J. Ding, and C. M. Hu, “Advanced query by humming system using diffused hidden Markov model and tempo based dynamic programming,” APSIPA ASC 2016
[14] A. Lv and G. Liu, “AnEffective Design for Fast Query-by-Humming System with Melody Segmentation and Feature Extraction,” ICCSEC 2017.
[15] B. Stasiak, “Follow That Tune – Dynamic Time Warping Refinement for Query by Humming,” Proc. Of Joint Conference NTAV/SPA 2012.
[16] J. Q. Sun and S. P. Lee, “Query by singing/humming system based on deep learning,” IJAER 2017.
[17] Y. Rubner, C. Tomasi, and L.J. Guibas, “A metric for distributions with applications to image databases,” in Proc. IEEE Int. Conf. Computer Vision, 1998.
[18] H. Sakoe and S. Chiba, "Dynamic programming algorithm optimization for spoken word recognition", IEEE Trans. Acoust. Speech Signal Processing, vol. ASSP-26, pp. 43-49, Feb. 1978.
[19] Essen Associative Code and Folksong Database:
http://www.esac-data.org./
[20] J. Serra, “Music similarity based on sequences of descriptors: tonal features applied to audio cover song identiﬁcation,” M.S. thesis, MTG, Universitat Pompeu Fabra, Barcelona, Spain, 2007.
[21] E. Keogh and C. A. Ratanamahatana, “Exact indexing of dynamic time warping,” Knowledge and Information Systems, 2004.
[22] Ranjani, S. Sri, et al. "Application of SHAZAM-Based Audio Fingerprinting for Multilingual Indian Song Retrieval", Advances in Communication and Computing, pp. 81-92, Springer India, 2015
[23] Tzanetakis, George, Andrey Ermolinskyi, and Perry Cook, "Pitch histograms in audio and symbolic music information retrieval", Journal of New Music Research, pp. 143-152, 2003.
[24] H. M. Yu, W. H. Tsai, and H. M. Wang, “A query-by-Singing system for retrieving Karaoke music,” IEEE Trans. Multimedia, vol. 10, no. 8, pp. 1626–1637, Dec. 2008.
[25] Z. Guo, Q. Wang, G. Liu, and J. Guo, “A query by humming system based on locality sensitive hashing indexes,” Signal Process., 2012.
[26] W. H. Tsai, H. M. Yu, and H. M. Wang, “A query-by-example technique for retrieving cover versions of popular songs with similar melodies,” in Proc. ISMIR, 2005, pp. 183–190.
[27] Chai-Jong Song, Hochong Park, Chang-Mo Yang, Sei-Jin Jang, and SeokPhil Lee, “Implementation of a practical query-by-singing/humming (QbSH) system and its commercial applications”, IEEE International Conference on ConsumerEelectronics, pp. 104-105, Jan. 2012.
[28] C.-C. Wang, J.-S. R. Jang, and W. Wang, “An improved query by singing/humming system using melody and lyrics information,” in Proc. Int. Society for Music Information Retrieval Conf., pp. 45–50, 2010.
[29] 戴齊廷，基於多重時間描述之內涵式音樂檢索。中央大學通訊工程學系碩士學位論文，2014。
[30] 黃梓翔，基於機器學習方法之巨量音樂檢索系統。中央大學通訊工程學系碩士學位論文，2016。

指導教授

張寶基(Pao-Chi Chang)

審核日期

2021-1-29

推文