利用 SCPL 分解端到端倒傳遞演算法

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：14

、訪客IP：3.145.115.195

姓名

王承凱(Cheng-Kai Wang) 查詢紙本館藏

畢業系所

軟體工程研究所

論文名稱

利用 SCPL 分解端到端倒傳遞演算法
(Decomposing End-to-End Backpropagation Based on SCPL)

相關論文

★ 透過網頁瀏覽紀錄預測使用者之個人資訊與性格特質	★ 透過矩陣分解之多目標預測方法預測使用者於特殊節日前之瀏覽行為變化
★ 預測交通需求之分佈與數量—基於多重式注意力機制之AR-LSTMs 模型	★ 動態多模型融合分析研究
★ 擴展點擊流：分析點擊流中缺少的使用者行為	★ 關聯式學習：利用自動編碼器與目標傳遞法分解端到端倒傳遞演算法
★ 融合多模型排序之點擊預測模型	★ 分析網路日誌中有意圖、無意圖及缺失之使用者行為
★ 基於自注意力機制產生的無方向性序列編碼器使用同義詞與反義詞資訊調整詞向量	★ 探索深度學習或簡易學習模型在點擊率預測任務中的使用時機
★ 空氣品質感測器之故障偵測--基於深度時空圖模型的異常偵測框架	★ 以同反義詞典調整的詞向量對下游自然語言任務影響之實證研究
★ 利用輔助語句與BERT模型偵測詞彙的上下位關係	★ 結合時空資料的半監督模型並應用於PM2.5空污感測器的異常偵測
★ 藉由權重之梯度大小調整DropConnect的捨棄機率來訓練神經網路	★ 使用圖神經網路偵測 PTT 的低活躍異常帳號

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

倒傳遞 (Backpropagation, BP) 是當今深度神經網路更新權重演算法
的基石，但反向傳播因反向鎖定 (backward locking) 的問題而效率不佳。
本研究試圖解決反向鎖定問題，並將提出的新方法命名為 Supervised
Contrastive Parallel Learning (SCPL)，SCPL 利用監督對比損失函數作為每個卷積層的區域目標函數，因為每一層的區域目標函數間互相隔離，
SCPL 可以平行地學習不同卷基層的權重。
本論文亦和過去在神經網路平行化的研究進行比較，探討現存方法
各自的優勢與限制，並討論此議題未來的研究方向。

摘要(英)

Backpropagation (BP) is the cornerstone of today’s deep learning algorithms to update the weights in deep neural networks, but it is inefficient partially because of the backward locking problem. This thesis proposes Supervised Contrastive Parallel Learning (SCPL) to address the issue of backward locking. SCPL uses the supervised contrastive loss as the local objective function for each layer. Because the local objective functions in different layers are isolated, SCPL can learn the weights of different layers in parallel. We compare SCPL with recent works on neural network parallelization. We discuss the advantages and limitations of the existing methods. Finally, we suggest future research directions on neural network parallelization.

關鍵字(中)

★ 倒傳遞
★ 反向鎖定
★ 監督對比損失函數
★ 平行化訓練
★ 監督式對比平行學習

關鍵字(英)

★ Backpropagation
★ backward locking
★ supervised contrastive loss
★ parallel learning
★ supervised contrastive parallel learning

論文目次

摘要 v
Abstract vi
致謝 vii
目錄 viii
一、緒論 1
二、相關研究 4
三、研究模型及方法 6
3.1 對比學習的機制 ......................................................... 6
3.2 監督對比損失函數 ...................................................... 8
3.3 學習機制與網路架構 ................................................... 9
3.4 推論函數及假設空間 ................................................... 11
3.5 與其他方法比較 ......................................................... 11
3.6 模型虛擬碼 ............................................................... 12
四、實驗結果與分析 14
4.1 實驗設定與實作細節 ................................................... 14
4.1.1 實驗設定 ......................................................... 14
4.1.2 實作細節 ......................................................... 14
4.2 分類任務準確率 ......................................................... 17
4.2.1 CIFAR-10 ........................................................ 17
4.2.2 CIFAR-100....................................................... 18
4.2.3 TinyImageNet-val .............................................. 18
4.3 泛化能力測試 ............................................................ 19
4.4 消融實驗 .................................................................. 21
4.4.1 資料擴增 ......................................................... 21
4.4.2 批次大小 ......................................................... 22
4.4.3 投影頭 ............................................................ 23
4.5 討論 ........................................................................ 24
4.5.1 方法比較與分析 ................................................ 24
4.5.2 問題探討 ......................................................... 25
五、總結 27
5.1 結論 ........................................................................ 27
5.2 未來展望 .................................................................. 28
參考文獻 29
附錄 A 實驗程式碼 31

參考文獻

[1] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by
back-propagating errors,” nature, vol. 323, no. 6088, pp. 533–536, 1986.
[2] M. Jaderberg, W. M. Czarnecki, S. Osindero, et al., “Decoupled neural interfaces
using synthetic gradients,” in International conference on machine learning, PMLR,
2017, pp. 1627–1635.
[3] Y.-W. Kao and H.-H. Chen, “Associated learning: Decomposing end-to-end backpropagation based on autoencoders and target propagation,” Neural Computation,
vol. 33, no. 1, pp. 174–193, 2021.
[4] D. Y. Wu, D. Lin, V. Chen, and H.-H. Chen, “Associated learning: An alternative to end-to-end backpropagation that works on cnn, rnn, and transformer,” in
International Conference on Learning Representations, 2021.
[5] A. Nøkland and L. H. Eidnes, “Training neural networks with local error signals,”
in International conference on machine learning, PMLR, 2019, pp. 4839–4850.
[6] S. Teerapittayanon, B. McDanel, and H.-T. Kung, “Branchynet: Fast inference via
early exiting from deep neural networks,” in 2016 23rd International Conference
on Pattern Recognition (ICPR), IEEE, 2016, pp. 2464–2469.
[7] H. Mostafa, V. Ramesh, and G. Cauwenberghs, “Deep supervised learning using
local errors,” Frontiers in neuroscience, p. 608, 2018.
[8] P. Khosla, P. Teterwak, C. Wang, et al., “Supervised contrastive learning,” Advances in Neural Information Processing Systems, vol. 33, pp. 18 661–18 673, 2020.
[9] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied
to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324,
1998.
[10] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,”
in Proceedings of the IEEE conference on computer vision and pattern recognition,
2016, pp. 770–778.
[11] C. J. Shallue, J. Lee, J. Antognini, J. Sohl-Dickstein, R. Frostig, and G. E. Dahl,
“Measuring the effects of data parallelism on neural network training,” arXiv preprint
arXiv:1811.03600, 2018.
[12] T. Vogels, S. P. Karimireddy, and M. Jaggi, “Powersgd: Practical low-rank gradient compression for distributed optimization,” Advances in Neural Information
Processing Systems, vol. 32, 2019.
[13] K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised visual representation learning,” in Proceedings of the IEEE/CVF conference
on computer vision and pattern recognition, 2020, pp. 9729–9738.
[14] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” in International conference on machine
learning, PMLR, 2020, pp. 1597–1607.
[15] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale
image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[16] A. Krizhevsky, G. Hinton, et al., “Learning multiple layers of features from tiny
images,” 2009.
[17] Y. Le and X. Yang, “Tiny imagenet visual recognition challenge,” CS 231N, vol. 7,
no. 7, p. 3, 2015.
[18] S. Garg, S. Balakrishnan, Z. Kolter, and Z. Lipton, “Ratt: Leveraging unlabeled
data to guarantee generalization,” in International Conference on Machine Learning, PMLR, 2021, pp. 3598–3609.
[19] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2:
Inverted residuals and linear bottlenecks,” in Proceedings of the IEEE conference
on computer vision and pattern recognition, 2018, pp. 4510–4520.
[20] M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural
networks,” in International conference on machine learning, PMLR, 2019, pp. 6105–
6114.

指導教授

陳弘軒(Hung-Hsuan Chen)

審核日期

2022-7-19

推文