利用深度學習模型融合多重感測器之小提琴弓法動作辨識

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：31

、訪客IP：18.191.228.88

姓名

劉寶云(Bao-Yun Liu) 查詢紙本館藏

畢業系所

通訊工程學系

論文名稱

利用深度學習模型融合多重感測器之小提琴弓法動作辨識
(Violin Bowing Action Recognition based on Multiple Modalities by Deep Learning-Based Sensing Fusion)

相關論文

★ 基於區域權重之衛星影像超解析技術	★ 延伸曝光曲線線性特性之調適性高動態範圍影像融合演算法
★ 實現於RISC架構之H.264視訊編碼複雜度控制	★ 基於卷積遞迴神經網路之構音異常評估技術
★ 具有元學習分類權重轉移網路生成遮罩於少樣本圖像分割技術	★ 具有注意力機制之隱式表示於影像重建三維人體模型
★ 使用對抗式圖形神經網路之物件偵測張榮	★ 基於弱監督式學習可變形模型之三維人臉重建
★ 以非監督式表徵分離學習之邊緣運算裝置低延遲樂曲中人聲轉換架構	★ 基於序列至序列模型之 FMCW雷達估計人體姿勢
★ 基於多層次注意力機制之單目相機語意場景補全技術	★ 基於時序卷積網路之單FMCW雷達應用於非接觸式即時生命特徵監控
★ 視訊隨選網路上的視訊訊務描述與管理	★ 基於線性預測編碼及音框基頻週期同步之高品質語音變換技術
★ 基於藉語音再取樣萃取共振峰變化之聲調調整技術	★ 即時細緻可調性視訊在無線區域網路下之傳輸效率最佳化研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

隨著人工智慧的興起，利用深度學習做人類動作辨識也變成現今很重要的研究議題之一，像是在電腦視覺與圖形辨識領域中，動作辨識就是其熱門的研究項目。
本篇所提出的論文是針對小提琴中弓法的動作辨識，是因為多媒體藝術表演中往往需要許多人力及時間，重複測試及彩排才能將環境的聲光效果與表演者完美配合，因此若能利用動作辨識使機器能夠在表演中辨識表演者所做的動作，之後就能夠利用該系統做後續觸發聲光效果等應用。我們提出利用多重裝置做動作辨識，裝置包括Kinect攝影機及Myo armband慣性感測器，來獲取深度影像及慣性資料，並個別經過前處理及資料擴增後，分別進入三維卷積架構以及長短期記憶架構中進行特徵訓練，最後透過決策融合的方法，將不同模型訓練後的特徵做融合，並輸出成最終的分類結果。不同裝置錄製的資料都有其優缺點，因此使用適當的多重裝置可以彌補單一裝置資料上的不足。這套系統應用在我們自己所拍攝的Vap多重裝置之小提琴動作資料庫上，可以達到不錯的辨識正確率。

摘要(英)

With the rise of Artificial Intelligence, the use of deep learning for human action recognition (HAR) has become one of the most important research topics today. For example, in the field of computer vision and graphics recognition, action recognition is its popular research project.
The paper presented in this article is aimed at the action recognition of the bowing in the violin because multimedia art performances often require a lot of manpower and time. Repeated tests and rehearsals can perfectly match the sound and light effects of the environment with the performers, so if they can be used action recognition enables the machine to recognize the actions performed by the performer during the performance, and then can use the system for subsequent triggering of sound and light effects and other applications. We propose to use multiple devices for action recognition. The devices include Kinect and Myo armband Inertial Measurement Unit (IMU). After preprocessing and data augmentation, the image data will be sent to the 3D convolution in deep learning for training. The inertial data will be sent to the long short-term memory (LSTM) network in deep learning for training. After training, we use the decision fusion to fuse the features of different devices, and output the final classification results.

關鍵字(中)

★ 動作辨識
★ Kinect
★ 深度攝影機
★ 慣性感測器
★ 深度學習
★ 多重裝置融合

關鍵字(英)

★ Action recognition
★ Kinect
★ Depth camera
★ Inertial sensor
★ deep learning
★ Multiple modalities
★ LSTM
★ CNN
★ violin

論文目次

摘要 I
Abstract II
致謝 III
圖目錄 VII
表目錄 IX
第一章、緒論 1
1.1 研究背景 1
1.2 研究動機與目的 3
1.3 論文架構 4
第二章、深度攝影機、慣性感測器及動作辨識相關介紹 5
2.1 深度攝影機 5
2.1.1 Kinect 深度攝影機 5
2.1.2 硬體規格 6
2.1.3 技術與功能 7
2.1.4 開發工具介紹Kinect SDK 8
2.2 慣性感測器 10
2.2.1 MYO Armband 智慧臂環慣性感測器 10
2.2.2 硬體規格 12
2.3 動作辨識 13
2.3.1 相關文獻介紹 13
2.3.2 小提琴動作辨識 15
第三章、深度學習相關基本介紹 18
3.1 類神經網路 18
3.1.1 類神經網路的學習機制 19
3.1.2 類神經網路發展歷史 20
3.2 深度學習 29
3.2.1 卷積神經網路 29
3.2.2 三維卷積神經網路 31
3.2.3 遞迴神經網路 33
3.2.4 長短期記憶模型 36
第四章、提出的小提琴動作辨識系統及決策融合 38
4.1 系統架構 38
4.2 小提琴動作辨識 40
4.2.1 前處理 40
4.2.2 深度學習模型 47
4.3 多模型決策級融合 52
4.4 VAP小提琴動作資料庫 54
4.4.1 小提琴動作辨識弓法介紹 55
4.4.2 資料錄製環境配置 59
第五章、實驗結果與分析討論 60
5.1 實驗環境介紹 60
5.2 實驗結果比較與討論 61
第六章、結論與未來展望 80
參考文獻 82

參考文獻

[1] KINECT 官方網站 : https://www.xbox.com/xbox-one/accessories/kinect
[2] Jamie Shotton ; Andrew Fitzgibbon ; Mat Cook ; Toby Sharp ; Mark Finocchio ; Richard Moore ; Alex Kipman ; Andrew Blake, “Real-time human pose recognition in parts from single depth images” in 2011 Conference on Computer Vision and Pattern Recognition (CVPR 2011),pp. 1297-1304, 20-25 June 2011
[3] Shahram Izadi , David Kim , Otmar Hilliges , David Molyneaux , Richard Newcombe , Pushmeet Kohli , Jamie Shotton , Steve Hodges , Dustin Freeman , Andrew Davison , Andrew Fitzgibbon ”KinectFusion: Realtime 3D Reconstruction and Interaction Using a Moving Depth Camera” UIST ′11 Proceedings of the 24th annual ACM symposium on User interface software and technology, October 16 - 19, 2011,pp559-568
[4] S. Rusinkiewicz and M. Levoy, "Efficient variants of the ICP algorithm," Proceedings Third International Conference on 3-D Digital Imaging and Modeling, Quebec City, Que., 2001, pp. 145-152.
[5] C. Liu, Y. Hu, Y. Li, S. Song, and J. Liu, “PKU-MMD: A Large Scale Benchmark for Continuous Multi-ModalHuman Action Understanding”, arXiv:1703.07475 [cs.CV], 2017
[6] https://personal.utdallas.edu/~kehtar/UTD-MHAD.html (Chen et al., IEEE ICIP 2015)
[7] Webster, D.; Celik, O. Systematic review of Kinect applications in elderly care and stroke rehabilitation J. NeuroEng. Rehabil. 2014, 11. [CrossRef] [PubMed]
[8] Gupta, H.P.; Chudgar, H.S.; Mukherjee, S.; Dutta, T.; Sharma, K. A continuous hand gestures recognition technique for human-machine interaction using accelerometer and gyroscope sensors. IEEE Sens. J. 2016,16, 6425–6432. [CrossRef]
[9] C. Chen, R. Jafari and N. Kehtarnavaz, "Improving human action recognition using fusion of depth camera and inertial sensors", IEEE Trans. Human-Mach. Syst., vol. 45, no. 1, pp. 51-61, Feb. 2015.
[10] S. W. Lee and K. Mase, “Activity and location recognition using wearable sensors,” IEEE Pervasive Computing, Vol.1, No.3, pp.24-32, 2002.
[11] J. G. Lee, M. S. Kim, T. M. Hwang, and S. J. Kang, “A mobile robot which can follow and lead human by detecting user location and behavior with wearable devices,” in IEEE International Conference on Consumer Electronics, Jan. 2016, pp. 209–210.
[12] R. Xie and J.Cao, “Accelerometer-based hand gesture recognition by neural network and similaritymatching”, IEEE Sensors Journal, Vol. 16, No. 11, 4537–4545, 2016.
[13] N. Dawar and N. Kehtarnavaz, ‘‘Action detection and recognition in continuous action streams by deep learning-based sensing fusion,’’ IEEE Sensors J., vol. 18, no. 23, pp. 9660–9668, Dec. 2018.
[14] W. Li, Z. Zhang, and Z. Liu, “Action recognition based on a bag of 3D points,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. Workshops, San Francisco, CA, USA, pp. 9–14, Jun. 2010.
[15] C. Chen, R. Jafari, and N. Kehtarnavaz, “Action recognition from depth sequences using depth motion maps-based local binary patterns,” in Proc. IEEE Winter Conf. Appl. Comput. Vis., Waikoloa Beach, HI, USA, pp. 1092–1099, Jan. 2015.
[16] D. Dalmazzo, and R. Rafael, "Air violin: a machine learning approach to fingering gesture recognition," proceedings of the 1st ACM SIGCHI International Workshop on Multimodal Interaction for Education, pp. 63-66, 2017.
[17] D. C. Dalmazzo, and R. Rafael, "Bowing gestures classification in violin performance: a machine learning approach," Frontiers in psychology 10: 344, 2019.
[18] W. S. Mcculloch and W. Pitts, “A Logical Calculus of the Ideas Immanent in Nervous Activity,” Bulletin of Mathematical Biophysics, vol.5, no.4, pp.115-133, Dec. 1943.
[19] F. A. Makinde, C. T. Ako, O. D. Orodu, I. U. Asuquo, “Prediction of crude oil viscosity using feed-forward back-propagation neural network (FFBPNN),” Petroleum and Coal , vol. 54, pp. 120-131, 2012.
[20] D. O. Hebb, “Organization of Behavior,” New York: Wiley & Sons.
[21] M. Minsky, S. Papert, “Perceptrons,” Cambridge, MA: MIT Press.
[22] P. J. Werbos, “Beyond regression: new tools for prediction and analysis in the behavioral sciences,” Ph.D. thesis, Harvard University, 1974.
[23] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” Nature, vol.323, no.6088, pp.533-536, 1986.
[24] S. Lawrence, et al., “Face recognition: A convolutional neural-network approach”, IEEE Transactions on Neural Networks, vol.8, no. 1, pp. 98-113, 1997.
[25] Y. Lecun, et al., “Gradient-based learning applied to document recognition”, Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
[26] I. Mrazova, M. Kukacka, “Hybrid convolutional neural networks”, Industrial Informatics INDIN 2008. 6th IEEE International Conference, 2008.
[27] K. Fukushima, “Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position,” Biological cybernetics, vol. 36, no. 4, pp. 193-202, 1980.
[28] Y. Lecun, et al., “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
[29] J. Shuiwang, X. Wei, Y. Ming, and Y. Kai, “3D Convolutional Neural Networks for Human Action Recognition,” IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 1, pp. 221–231, 2013.
[30] J. J. Hopfield, “Neural networks and physical systems with emergent collective computational abilities,” Proceedings of the national academy of sciences, vol. 79, no. 8, pp. 2554-2558, 1982.
[31] Hochreiter, Sepp; Schmidhuber, Jürgen (1997-11-01). "Long Short-Term Memory". Neural Computation. 9 (8): 1735–1780.
[32] N. Dawar, S. Ostadabbas and N. Kehtarnavaz, “Data augmentation in deep learning-based fusion of depth and inertial sensing for action recognition,” IEEE Sensors Letters, vol.3, no.1, pp.1-4, 2019.
[33] W. Li, C. Chen, H. Su and Q. Du, “Local binary patterns and extreme learning machine for hyperspectral imagery classification,” IEEE Transactions on Geoscience and Remote Sensing, vol.53, no.7, pp.3681-3693, 2015.
[34] TensorFlow: an open source Python package for machine intelligence, https://www.ten-sorflow.org, retrieved Dec. 1, 2016.
[35] J. Dean, et al. “Large-Scale Deep Learning for Building Intelligent Computer Systems,” in Proceedings of the Ninth ACM International Conference on Web Search and Data Min-ing, pp. 1-1, Feb. 2016.
[36] keras官方網站: https://keras.io/

指導教授

張寶基(Pao-Chi Chang)

審核日期

2020-7-23

推文