English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 78818/78818 (100%)
造訪人次 : 35004314      線上人數 : 887
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋


    請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/92312


    題名: 視覺追蹤的多尺度視覺基礎網路;Multi-Scale Vision Foundation Networks for Visual Tracking
    作者: 王品灃;Wang, Pin-Feng
    貢獻者: 通訊工程學系
    關鍵詞: 單目標追蹤;階層式;重新預訓練;視覺轉換器;模板更新策略;single object tracking;hierarchical;re-pretraining;vision Transformer;template update strategy
    日期: 2023-07-19
    上傳時間: 2023-10-04 15:26:13 (UTC+8)
    出版者: 國立中央大學
    摘要: 在單目標追蹤中,採用階層式(hierarchical)的Vision Transformer(ViT)架構的追蹤器,往往追蹤表現不如plain ViT,同時文獻彼此之間架構都是有差異的,並沒有一個通用的網路架構。本論文提出一個通用的階層式網路架構(HyperXTrack),第一個將骨幹網路的架構,引用到追蹤任務上作為交互作用網路,同時加入時空上下文,空間的上下文是多尺度資訊,時間的上下文提供歷史資訊。HyperXTrack能進行全局與局部空間交互作用,且交互作用計算複雜度為影像解析度的線性複雜度。HyperXTrack每一個block都是先進行比對細緻紋理特徵,再進行整個物件外觀輪廓的交互比對。交互骨幹網路採用本論文所提之注意力機制,同時採用經典的堆疊規則在注意力機制前使用卷積。最後,本論文提出輕量的重新預訓練策略,可以使用預訓練好的MaxViT網路參數,將更改網路交互運算的網路重新訓練一個epoch,就可以讓網路的參數可以遷移到下游任務上。實驗結果顯示,本論文設計的HyperXTrack架構在GOT-10k數據集上AO以75%超越OSTrack的71%,同時僅需要使用30M參數量的階層式架構,就可以超越OSTrack的93M參數量的ViT架構。;In single object tracking, the hierarchical Vision Transformer (ViT) architectures usually perform worse than plain ViT among current trackers. At the same time, the network architectures of state-of-the-art trackers are distinct, and thus there is no general purposed network architecture. This paper presents HyperXTrack, the first backbone network architecture that is applied to interaction in visual tracking. In addition, the proposed backbone interacts spatio-temporal context, where spatial context is the multi-scale information and temporal context provides historical information. HyperXTrack proceeds global and local spatial interaction, and computation complexity is linear with image resolution. After correlating with local texture features, the contour of the entire object is interacting. Interaction backbone networks adopt the proposed attention mechanism and the classic stacking rule where convolutions are applied before attention mechanism. Finally, this thesis proposes lightweight re-pretraining strategy. After modifying the existing network MaxViT, this thesis uses the pre-trained MaxViT weights, and re-pretrains only one epoch. Then the network can transfer to the downstream tasks. The experimental results show that HyperXTrack surpasses OSTrack′s 71% in AO with 71.8% on the GOT-10k dataset. HyperXTrack using a hierarchical architecture only needs 30M parameters, which can surpass OSTrack architecture with 93M parameters.
    顯示於類別:[通訊工程研究所] 博碩士論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    index.html0KbHTML31檢視/開啟


    在NCUIR中所有的資料項目都受到原著作權保護.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明