建構神經網路模型基於貝氏推論—以時間序列資料為例

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：14

、訪客IP：18.225.56.194

姓名

戴銘哲(Ming-Jhe Dai) 查詢紙本館藏

畢業系所

工業管理研究所

論文名稱

建構神經網路模型基於貝氏推論—以時間序列資料為例
(Bayesian Inference for Neural Network Model with Application on Time Series Data)

相關論文

★ 應用失效模式效應分析於產品研發時程之改善	★ 服務品質因子與客戶滿意度關係研究-以汽車保修廠服務為例
★ 家庭購車決策與行銷策略之研究	★ 計程車車隊派遣作業之研究
★ 電業服務品質與服務失誤之探討-以台電桃園區營業處為例	★ 應用資料探勘探討筆記型電腦異常零件-以A公司為例
★ 車用配件開發及車主購買意願探討(以C公司汽車配件業務為實例)	★ 應用田口式實驗法於先進高強度鋼板阻抗熔接條件最佳化研究
★ 以層級分析法探討評選第三方物流服務要素之研究-以日系在台廠商為例	★ 變動良率下的最佳化批量研究
★ 供應商庫存管理架構下運用層級分析法探討供應商評選之研究-以某電子代工廠為例	★ 台灣地區快速流通消費產品銷售預測模型分析研究–以聯華食品可樂果為例
★ 競爭優勢與顧客滿意度分析以中華汽車為例	★ 綠色採購導入對電子代工廠的影響-以A公司為例
★ 以德菲法及層級分析法探討軌道運輸業之供應商評選研究–以T公司為例	★ 應用模擬系統改善存貨管理制度與服務水準之研究-以電線電纜製造業為例

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2026-1-1以後開放)

摘要(中)

深度學習演算法的進步，加上軟硬體的計算能力提高，對神經網路模型的訓練變得更加容易，也因此產生大量的相關研究及產業應用。對輸入數據來說，每一組數據皆對應一權重值，再藉由激勵函數輸出結果，然而多數模型的權重值為一固定值，不管設計多複雜的模型，經過反向傳播後所修正的權重仍為一固定值，輸出值也僅僅依靠這一固定值計算出結果，導致整個模型不夠穩健。
引入貝氏推論的神經網路模型可以視為一個條件分配模型，權重也從一固定值轉為分配型態，透過計算權重的後驗分配平均值求得預測果，但由於計算出所有神經網路的預測值在求均值的計算量過於複雜，所以我們將採用Monte Carlo Dropout作為貝氏推論的近似方法。

摘要(英)

Deep learning approaches have made it easier to train neural network models, and developments in software and hardware computing capacity have resulted in a flurry of related research and industrial applications. Each set of data in the input data correlates to a weight value, and the activation function outputs the result. However, the weight value of most models is a fixed value that is rectified after backpropagation, regardless of how complex the model is built. The weight remains a fixed value, and the output values relies solely on the fixed value to calculate the result, making the model not robust enough.
A conditional distribution model can be used to describe the neural network model that uses Bayesian inference. Because all neural networks are computed, the weight is also changed from a fix value to a distribution type, and the prediction result is derived by calculating the average value of the posterior distribution of the weight. The calculation of the predicted value of the road is too complicated in averaging, so we will use Monte Carlo Dropout as the Bayesian approximation.

關鍵字(中)

★ 貝氏推論
★ 神經網路
★ 時間序列

關鍵字(英)

★ Bayesian Inference
★ Neural Network
★ Time Series

論文目次

中文摘要i
Abstract ii
Content iii
List of Figures v
List of Tables vi
Chapter 1 1
1.1 Research Background 1
1.2 Research Objective 2
Chapter 2 3
2.1 Deep Learning 3
2.2 Recurrent Neural Network (RNN) 4
2.3 Long Short-Term Memory (LSTM) 5
2.3.1 Loss Function 7
2.3.2 Activation Function 8
2.3.3 Optimization Algorithm 10
2.3.4 Dropout Rate 12
2.4 Bayesian Neural Network (BNN) 12
Chapter 3 15
3.1 LSTM Model 15
3.2.1 Architecture 15
3.2.2 MC dropout 17
3.2.3 Batch Normalization 20
3.2.4 Hyperparameters Selection 22
3.3 Evaluation Metrics 22
Chapter 4 23
4.1 Data Collection 23
4.2 Data Preprocessing 23
4.3 Experiment Environment 24
4.4 Results 25
Chapter 5 29
5.1 Conclusion 29
5.2 Future Work 29
References 31

參考文獻

[1] Andrieu, C., De Freitas, N., Doucet, A., & Jordan, M. I. (2003). An introduction to MCMC for machine learning. Machine learning, 50(1), 5-43.
[2] Bengio, Y. (2009). Learning deep architectures for AI. Now Publishers Inc.
[3] Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence, 35(8), 1798-1828.
[4] Bishop, C. M. (2006). Pattern recognition. Machine learning, 128(9).
[5] Blei, D. M., Kucukelbir, A., & McAuliffe, J. D. (2017). Variational inference: A review for statisticians. Journal of the American statistical Association, 112(518), 859-877.
[6] Blundell, C., Cornebise, J., Kavukcuoglu, K., & Wierstra, D. (2015). Weight uncertainty in neural network. International Conference on Machine Learning.
[7] Elman, J. L. (1990). Finding structure in time. Cognitive science, 14(2), 179-211.
[8] Gal, Y., & Ghahramani, Z. (2015). Dropout as a Bayesian approximation: appendix. arXiv preprint arXiv:1506.02157, 420.
[9] Gal, Y., & Ghahramani, Z. (2016). Dropout as a bayesian approximation: Representing model uncertainty in deep learning. international conference on machine learning.
[10] Gers, F. A., Schmidhuber, J., & Cummins, F. (2000). Learning to forget: Continual prediction with LSTM. Neural computation, 12(10), 2451-2471.
[11] Girshick, R. (2015). Fast r-cnn. Proceedings of the IEEE international conference on computer vision.
[12] Goodfellow, I., Bengio, Y., Courville, A., & Bengio, Y. (2016). Deep learning (Vol. 1). MIT press Cambridge.
32
[13] Greff, K., Srivastava, R. K., Koutník, J., Steunebrink, B. R., & Schmidhuber, J. (2016). LSTM: A search space odyssey. IEEE transactions on neural networks and learning systems, 28(10), 2222-2232.
[14] Hahnloser, R. H., Seung, H. S., & Slotine, J.-J. (2003). Permitted and forbidden sets in symmetric threshold-linear networks. Neural computation, 15(3), 621-638.
[15] Hinton, G. E. (1987). Learning translation invariant recognition in a massively parallel networks. International Conference on Parallel Architectures and Languages Europe.
[16] Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.
[17] Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. International conference on machine learning.
[18] Kawakami, K. (2008). Supervised sequence labelling with recurrent neural networks. Ph. D. thesis.
[19] Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
[20] LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. nature, 521(7553), 436-444.
[21] Lipton, Z. C., Berkowitz, J., & Elkan, C. (2015). A critical review of recurrent neural networks for sequence learning. arXiv preprint arXiv:1506.00019.
[22] Logsdon, B. A., Hoffman, G. E., & Mezey, J. G. (2010). A variational Bayes algorithm for fast and accurate multiple locus genome-wide association analysis. BMC bioinformatics, 11(1), 1-13.
[23] Rojas, R. (2013). Neural networks: a systematic introduction. Springer Science & Business Media.
[24] Ruder, S. (2016). An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747.
33
[25] Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1), 1929-1958.
[26] Szepesvári, C. (2010). Algorithms for reinforcement learning. Synthesis lectures on artificial intelligence and machine learning, 4(1), 1-103.
[27] Tsuda, K., Kin, T., & Asai, K. (2002). Marginalized kernels for biological sequences. Bioinformatics, 18(suppl_1), S268-S275.
[28] Yuan, X., Li, L., & Wang, Y. (2019). Nonlinear dynamic soft sensor modeling with supervised long short-term memory network. IEEE transactions on industrial informatics, 16(5), 3168-3176.
[29] Zhang, N., Lei, D., & Zhao, J. (2018). An improved Adagrad gradient descent optimization algorithm. 2018 Chinese Automation Congress (CAC).

指導教授

葉英傑(Ying-Chieh Yeh)

審核日期

2021-7-14

推文