English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 80990/80990 (100%)
造訪人次 : 42686619      線上人數 : 1405
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋


    請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/95480


    題名: 結合Perlin Noise 與 Diffusion Model 的山水風景動畫生成;Animating Landscapes: Integrating Perlin Noise and Diffusion Model for ShanShui Scenery
    作者: 徐紹恩;Hsu, Shou-En
    貢獻者: 資訊工程學系
    關鍵詞: 自動動畫生成;自動圖畫生成;深度學習;人工智慧;山水風景;擴散模型;GPT;Perlin噪聲;AnimateDiff;Stable Diffusion;Automatic Animation Generation;Automatic Image Generation;Deep Learning;Artificial Intelligences;ShanShui Landscapes;Diffusion Model;GPT;Perlin Noise;AnimateDiff;Stable Diffusion
    日期: 2024-07-13
    上傳時間: 2024-10-09 16:53:39 (UTC+8)
    出版者: 國立中央大學
    摘要: 當前深度學習模型在生成東方山水風景圖像時表現不佳,主要原因是現有訓練資
    料集中此類數據量有限,導致模型難以準確學習和理解其特徵和結構。為了解決這一
    問題,我們提出了幾種改進策略。
    首先,我們旨在擴充東方山水風景數據集,以提高模型對該類型風景的理解能
    力。其次,我們計劃利用DreamBooth的微調技術對DiffusionModel進行調整,並引入
    更多與東方山水風景相關的提示詞,如地形特徵、植被分佈和色彩風格等。此外,我
    們提議開發一個山水骨架生成模組,使用PerlinNoise生成骨架圖,並通過ControlNet
    限制Diffusion Model,使生成圖像在骨架圖的基礎上增加色彩,從而增強畫面豐富度。
    為了讓Diffusion Model 更好地理解我們所需生成的圖像,我們在生成圖片的提示
    詞中使用GPT-4來擴充這些提示詞。我們將骨架圖輸入GPT-4,生成對於骨架圖結構
    的詳細敘述,並根據這些敘述進行對應風格上的改寫。這樣可以讓DiffusionModel更
    好地理解使用者所需的提示詞,並改善提示詞描述不清楚的問題。
    為了進一步提升生成圖像的品質,我們引入了TextualInversion,特別應用於負面
    提示,以改善DiffusionModel對於品質不佳圖像的理解,從而避免生成低質量的圖像。
    此外,將Diffusion Model 生成的著色圖進行I2V編碼處理,生成影片Diffusion
    Model(AnimateDiff)的輸入,最終由 AnimateDiff 生成生動的動畫。這一整合的工作
    流程允許用戶靈活調整各步驟以滿足特定需求,例如在骨架圖上添加物件、調整著色
    提示詞以及設置動畫框架的個別提示詞。
    我們還比較和評估了AnimateDiff中不同版本的運動模塊,以確定最適合我們系統
    的版本。實驗結果表明,引入TextualInversion顯著提升了圖像生成的品質。我們進一
    步比較了不同版本的ControlNet對生成圖像的影響。在附錄部分,我們提供了幾種靈
    活調整生成圖像的方法,並附上相應的結果。
    通過針對性的研究和創新方法的應用,我們有信心提升生成效果,從而豐富和拓
    展深度學習在圖像生成領域的應用,為人們帶來更多樣化和高品質的視覺體驗。;Current deep learning models exhibit suboptimal performance in generating ShanShui
    landscape images, primarily due to the limited representation of such data in existing training
    datasets. This deficiency hampers the models’ ability to accurately learn and understand the
    distinctive features and structures of ShanShui landscapes. To address this issue, we propose
    several improvement strategies.
    Firstly, we aim to expand the dataset with more ShanShui landscape images to enhance
    the model’s comprehension of this specific type of scenery. Secondly, we plan to fine-tune
    the Diffusion Model using DreamBooth and introduce additional prompts related to ShanShui
    landscapes, such as terrain features, vegetation distribution, and color styles. Additionally, we
    propose developing a landscape skeleton generation module that employs Perlin Noise to create
    skeleton images. By leveraging ControlNet to constrain the Diffusion Model, we can enrich the
    visual output by adding color to these skeleton images.
    To further improve the model’s understanding of the desired output, we utilize GPT-4 to
    augment the prompts used for generating images. Specifically, we input the skeleton images
    into GPT-4 to generate detailed descriptions of their structure, which are then used to refine the
    corresponding style prompts. This approach enhances the Diffusion Model’s comprehension of
    user requirements and mitigates issues related to unclear prompt descriptions.
    To further enhance the quality of generated images, we incorporate Textual Inversion,
    particularly applied to negative prompts, to improve the Diffusion Model’s understanding of
    low-quality images and help avoid generating them.
    Furthermore, the colored images generated by the Diffusion Model are processed through
    anI2VEncodertocreateinputdataforavideoDiffusionModel(AnimateDiff), whichultimately
    produces vivid animations. This integrated workflow allows users to flexibly adjust each step to
    meet specific needs, such as adding objects to the skeleton image, modifying coloring prompts,
    and setting individual prompts for specific animation frames.
    Wealsocompareandevaluate various versions of the Motion Module within AnimateDiff
    to identify the most suitable version for our system. Experimental results demonstrate that the
    incorporation of Textual Inversion significantly enhances image generation quality. Addition
    ally, we compare the effects of different versions of ControlNet on the generated images. In the
    appendix, we provide several methods for flexibly adjusting the generated images, along with
    the corresponding results.
    Through targeted research and the application of innovative methods, we are confident in
    our ability to improve generation performance, thereby enriching the diversity and quality of
    visual experiences in the field of deep learning image generation.
    顯示於類別:[資訊工程研究所] 博碩士論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    index.html0KbHTML33檢視/開啟


    在NCUIR中所有的資料項目都受到原著作權保護.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明