English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 80990/80990 (100%)
造訪人次 : 43326495      線上人數 : 1139
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋


    請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/93186


    題名: 同質性與異質性集成式重採樣方法於類別不平衡問題之研究;Homogeneous and Heterogeneous Ensemble Resampling Approaches for the Class Imbalance Problem
    作者: 王珮庭;Wang, Pei-Ting
    貢獻者: 資訊管理學系
    關鍵詞: 資料探勘;類別不平衡;集成式學習;data mining;class imbalance;ensemble learning
    日期: 2023-07-18
    上傳時間: 2024-09-19 16:46:51 (UTC+8)
    出版者: 國立中央大學
    摘要: 在資料探勘領域中,資料的收集往往伴隨著各種資料品質的問題,包括:數據含有重複值 duplicate values 、遺漏值 (missing values)、離群值 (outlier)、資料格式不一 (data inconsistency)等問題,這些問題也間接影響提取有用資訊的困難度。此外,由於現實世界所發生的機率不同,類別不平衡問題(Class Imbalance)也成為資料探勘中一個很重要的課題,此問題會導致在模型預測和分類中,對少數類別的預測性能下降,並對資料分析的準確性和可靠性上產生負面影響。
    因此,本論文主要探討類別不平衡問題。根據過往文獻,本研究以資料層級方法,彈性搭配不同分類演算法方式,來對類別不平衡資料集進行重採樣,探討在不同重採樣下,調整類別大小類別比例是否影響分類性能。另外,由於現有文獻中並未提出將不同重採樣所訓練的單一分類器進行集成建立成多重分類器,以及將不同重採樣樣本進行合併,搭配單一分類器或集成式分類器。因此,本研究以集成式方法(Ensemble Method)為基礎,提出同質性(Homogeneous)和異質性(Heterogeneous)方法,探討在不同處理流程下,哪種組合方式可以更好的處理類別不平衡問題。
    本研究透過實驗結果,證明在資料前處理方法中以資料層級方法對類別不平衡資料集進行重採樣能有效改善分類表現,且重採樣的大小類別平衡比例對分類器表現有顯著的影響。而在全面比較同質性與異質性方法中,多重分類器和樣本合併方法的單一分類器與集成式分類器,在統計結果中並無差異性。但異質性方法相對於同質性方法,更能夠在不同分類演算法上發掘出最佳的搭配方式,提升分類準確率(AUC)。這些實驗結果為後續研究者提供可進一步拓展與改進集成式分類器的方向,並為解決類別不平衡問題提供更多的選擇和優化策略。;In the field of data mining, data collection often comes with various data quality issues, including duplicate values, missing values, outliers, and data inconsistency, which indirectly affect the difficulty of extracting useful information. Furthermore, the class imbalance has become an important issue in data mining due to the different probabilities of events in the real world. This problem leads to decreased predictive performance for minority classes in model prediction and classification, negatively impacting the accuracy and reliability of data analysis.
    Therefore, this paper focuses on addressing the class imbalance problem. Based on previous literature, this study employs data-level approaches and flexibly combines different
    classification algorithms to resample class-imbalanced datasets. It explores whether adjusting the class proportions under different resampling techniques affects the classification performance. Moreover, since existing literature does not propose the integration of individual classifiers trained with different resampling techniques to build multiple classifiers or merging different resampled samples with single classifiers or ensemble classifiers, this research proposes homogeneous and heterogeneous methods based on ensemble methods to explore which combination approach can better handle class imbalance problems under different processing flows.
    Through experimental results, this study demonstrates that resampling class-imbalanced datasets using data-level techniques in data preprocessing can effectively improve classification performance, and the balance ratio of resampled minority and majority classes significantly influences classifier performance. In the comprehensive comparison between homogeneous and heterogeneous methods, there is no statistical difference between multiple classifiers and the single classifier or ensemble classifier using sample merging. However, heterogeneous methods, compared to homogeneous methods, are more capable of exploring the best combinations with different classification algorithms to enhance classification accuracy (AUC). These experimental results provide directions for further expansion and improvement of ensemble classifiers and offer more choices and optimization strategies for addressing class imbalance problems.
    顯示於類別:[資訊管理研究所] 博碩士論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    index.html0KbHTML43檢視/開啟


    在NCUIR中所有的資料項目都受到原著作權保護.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明