中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/89934
English  |  正體中文  |  简体中文  |  全文笔数/总笔数 : 80990/80990 (100%)
造访人次 : 42716083      在线人数 : 1430
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜寻范围 查询小技巧:
  • 您可在西文检索词汇前后加上"双引号",以获取较精准的检索结果
  • 若欲以作者姓名搜寻,建议至进阶搜寻限定作者字段,可获得较完整数据
  • 进阶搜寻


    jsp.display-item.identifier=請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/89934


    题名: 具有細化模塊和特殊損失函數的單鏡頭 3D人體姿態估計;Monocular Based 3D Human Pose Estimation with Refinement Block and Special Loss Function
    作者: 羅翊甄;Luo, Yi-Jhen
    贡献者: 電機工程學系
    关键词: 3D人體姿態估計;單鏡頭人體姿態估計;卷積神經網路;3D human pose estimation;Monocular based human pose estimation;Convolution neural network
    日期: 2022-08-03
    上传时间: 2022-10-04 12:05:10 (UTC+8)
    出版者: 國立中央大學
    摘要: 近年來,隨著GPU 運算能力的發展以及各種演算法的發展,深度學習在許多任務上都有了顯著的進步,特別是基於影像的應用,也已經被廣泛的使用於我們的日常生活中,常見的應用有:人臉辨識解鎖、停車場的車牌辨識或是產品的瑕疵檢測都已經有了成熟的發展,這說明了深度學習所帶來的影響。
    最近,隨著深度卷積神經網絡的發展 ,來自單鏡頭RGB圖像的3D人體姿態估計引起了許多關注。許多算法將2.5D heatmap視為3D座標,其X軸和Y軸對應圖像座標,Z軸對應相機座標。因此,通常採用相機矩陣或根骨架節點與相機之間的距離(ground-truth資訊)來將2.5D座標轉換到3D空間,這限制了在現實世界中的泛用性。2.5D heatmap忽略了2D和3D位置之間的轉換,這意味著它失去了一些轉換功能。
    在本文中,我們提出了一個端到端框架,它可以利用RGB圖像中的上下文訊息直接從單鏡頭圖像預測在空間中的3D骨架。具體來說,我們使用依賴於 2D heatmap和 volumetric heatmap的多損失方法以及一個細化模塊來定位相對於根節點的 3D 人體姿態。我們的方法將 2D heatmap和volumetric heatmap作為計算損失的特 徵,並結合相對的3D位置的損失來生成總損失。該模型可以聯合學習2Dheatmap特徵和3D位置,並專注於相機座標中的相對根骨架點的3D位置。實驗結果表明,我們的模型可以在 Human3.6M上很好地預測相對根節點的3D人體姿態。;In recent years, with the development of GPU computing power and the development of various algorithms, deep learning has made significant progress in many tasks. In particular, image-based applications have also been widely used in our daily life. Common applications are: face recognition unlocking, license plate recognition in parking lots, or product defect detection have all matured, which illustrates the impact of deep learning.
    Recently, 3D human pose estimation (HPE) from a monocular RGB image has attracted much attention following the success of a deep convolution neural network. Many algorithms take 2.5D heatmaps as the 3D coordinate, whose X axis and Y axis are corresponding to image coordinate and Z axis corresponding to camera coordinate. Therefore, the camera matrix or the distance between root skeleton and camera (the ground-truth information) is usually adopted to transform the 2.5D coordinate to 3D space, which limits the applicability in real world. 2.5D heatmaps ignore the conversion between 2D and 3D positions which means it loses some conversion feature.
    In this paper, we present an end-to-end framework which can utilize the contextual information in RGB image to directly predict 3D space skeleton from a monocular image. Specifically, we use the multi-loss method that depends on 2D heatmaps and volumetric heatmaps and a refinement block to locate root-relative 3D human pose. Our approach takes 2D heatmaps and volumetric heatmaps as features to compute loss and combine the loss from relative 3D location to generate the total loss. The model can learn the 2D heatmap feature and 3D location jointly and focus on the root-relative 3D position in the camera coordinate. And the experimental result shows that our model can predict relative 3D human pose well on Human3.6M.
    显示于类别:[電機工程研究所] 博碩士論文

    文件中的档案:

    档案 描述 大小格式浏览次数
    index.html0KbHTML36检视/开启


    在NCUIR中所有的数据项都受到原著作权保护.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明