摘要(英) |
In recent years, with the development of the IoT(Internet of Things) and deep learning, artificial intelligence has been applied in more places. The appearance of smart speakers has changed consumers’ habits and enabled them to directly give verbal instructions. This trend also shows that the future of home appliances will tend to use voice input commands, but most home appliances do not operate like the personal computer has an operating system to allocate computing resources, is organized by multiple micro- controllers to repeatedly perform functions. To control the microcontroller with voice commands, it is necessary to run a wake-up word recognition system on the micro- controller.
In this thesis, we uses Depth-wise Separable Convolution to implement the wake word recognition model. Using Depth-wise Separable Convolution can greatly reduce the parameters, which is very helpful for microcontrollers with limited memory and computing. This system will first convert the voice data into features through MFCC, and then use neural network training to learn the types of wake-up words and identify whether the features contain wake-up words. |
參考文獻 |
[1] Y. Zhang, N. Suda, L. Lai及V. Chandra, 作者, 「Hello Edge: Keyword Spotting on Microcontrollers」, arXiv:1711.07128 [cs, eess], 2月 2018, 引見於: 6月 02, 2020. [線上]. 載於: http://arxiv.org/abs/1711.07128.
[2] 「tutorial on hmm and applications.pdf」. 引見於: 6月 02, 2020. [線上]. 載於: https://www.ece.ucsb.edu/Faculty/Rabiner/ece259/Reprints/tutorial%20on%20hmm%20and%20applications.pdf.
[3] LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015). https://doi.org/10.1038/nature14539
[4] Liu, Pengfei & Qiu, Xipeng & Huang, Xuanjing. (2016). Recurrent Neural Network for Text Classification with Multi-Task Learning.
[5] K. Greff, R. K. Srivastava, J. Koutník, B. R. Steunebrink and J. Schmidhuber, "LSTM: A Search Space Odyssey," in IEEE Transactions on Neural Networks and Learning Systems, vol. 28, no. 10, pp. 2222-2232, Oct. 2017, doi: 10.1109/TNNLS.2016.2582924.
[6] J. Chung, C. Gulcehre, K. Cho, Y. Bengio. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling, arXiv:1412.3555 [cs], 12月 2014, 引見於: 6月 02, 2020. [線上]. 載於: http://arxiv.org/abs/1412.3555.
[7] Sainath, Tara N. / Parada, Carolina (2015): "Convolutional neural networks for small-footprint keyword spotting", In INTERSPEECH-2015, 1478-1482.
[8] 「Microcontroller.pdf」. 引見於: 6月 02, 2020. [線上]. 載於: https://ti.tuwien.ac.at/ecs/teaching/courses/mclu/theory-material/Microcontroller.pdf.
[9] F. Chollet, "Xception: Deep Learning with Depthwise Separable Convolutions," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 2017, pp. 1800-1807, doi: 10.1109/CVPR.2017.195.
[10] Ioffe, Sergey & Szegedy, Christian. (2015). Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift.
[11]P. Warden. Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition, arXiv:1804.03209 [cs], 4月 2018, 引見於: 6月 15, 2020. [線上]. 載於: http://arxiv.org/abs/1804.03209.
[12]D. P. Kingma, J. Ba. Adam: A Method for Stochastic Optimization, arXiv:1412.6980 [cs], 1月 2017, 引見於: 6月 02, 2020. [線上]. 載於: http://arxiv.org/abs/1412.6980. |