中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/95429
English  |  正體中文  |  简体中文  |  Items with full text/Total items : 80990/80990 (100%)
Visitors : 42701318      Online Users : 1364
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version


    Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/95429


    Title: 資訊安全中的類別不平衡:欠採樣、過採樣和混合方法的比較研究;Addressing Class Imbalance in Information Security: Comparative Analysis of Undersampling, Oversampling, and Hybrid Approaches
    Authors: 曾令騰;TSENG, LING-TENG
    Contributors: 資訊管理學系在職專班
    Keywords: 資訊安全;類別不平衡;二分類;多分類;數據重採樣技術;Information Security;Class Imbalance;Binary;Five-class;Data Resampling
    Date: 2024-05-14
    Issue Date: 2024-10-09 16:51:05 (UTC+8)
    Publisher: 國立中央大學
    Abstract: 本研究專注於資訊安全領域中類別不平衡的問題,著重於二分類與五分類的機器學習實驗。透過分析不同分類器(包括ANN、KNN、RF、SVM)在處理不同類別數據時的效能,探索了多種數據處理技術包括過採樣(Random Oversampling、SMOTE、Borderline SMOTE、ADASYN)、欠採樣(ENN、Tomek Links)和混合方法(SMOTE-ENN、SMOTE-Tomek Links)。在處理類別不平衡的數據集時,選擇合適的模型和數據處理策略對於降低型二錯誤率至關重要。減少型二錯誤意味著提高了對少數類的識別能力,這對於許多應用來說,如醫療診斷、資訊安全等,是極其關鍵的。二分類資料使用個案A公司的資訊安全Log,日誌資料被分類為「有危害」和「無危害」兩種類型,在類別不平衡的情況下,資安風險中最重要的就是減少型二錯誤,也就是明明有資安風險卻被判別為無資安風險,實驗結果在ANN + Random Oversampling有著最低的型二錯誤率9.09%,相較於原始資料的型二錯誤率(ANN :81% 、KNN: 54% 、RF: 24% 、SVM :45%)降低許多。五分類使用著名的KDD99網路入侵偵測資料集,先做前處理把22種攻擊類型轉為四大類攻擊,其中極度不平衡的數據集(類別四(R2L)和類別五(U2R)),在不同的分類器上處理的表現有顯著差異。特別是在使用過採樣技術後,對於類別五的預測性能有顯著提升,其中ANN + SMOTE-ENN組合對於類別五的性能提升最為明顯,此外分析還顯示,在降低少數類別的型二錯誤率時可能會提高多數類別的錯誤率,顯示了處理類別不平衡問題的複雜性,並強調了選擇合適的數據處理策略的重要性。;This study focuses on the issue of class imbalance within the field of information security, emphasizing experiments in binary and five-class machine learning classification. By analyzing the performance of different classifiers (including ANN, KNN, RF, SVM) in handling various categories of data, a range of data processing techniques was explored, including oversampling (Random Oversampling, SMOTE, Borderline SMOTE, ADASYN), undersampling (ENN, Tomek Links), and hybrid methods (SMOTE-ENN, SMOTE-Tomek Links). Selecting appropriate models and data processing strategies is crucial for reducing Type II error rates when dealing with imbalanced datasets. For binary classification, the study used information security logs from Company A, and it categorized the log data into ′harmful′ and ′harmless′. In scenarios of class imbalance, reducing Type II errors, which misclassify actual security risks as non-threatening, is of utmost importance. The experimental results showed that ANN + Random Oversampling achieved the lowest Type II error rate of 9.09%, a significant reduction compared to the original data′s Type II error rates (ANN: 81%, KNN: 54%, RF: 24%, SVM: 45%). For the five-class classification, the study used the renowned KDD99 dataset, initially preprocessing 22 types of attacks into four major categories. In this extremely imbalanced dataset (especially for categories 4 (R2L) and 5 (U2R)), significant differences in performance were observed among the classifiers. Notably, the predictive performance for category 5 significantly improved after applying oversampling techniques, with the ANN + SMOTE-ENN combination showing the most pronounced improvement for category 5. Furthermore, the analysis indicated that reducing the Type II error rate for minority classes might increase the error rate for majority classes, highlighting the complexity of addressing class imbalance issues and underscoring the importance of selecting suitable data processing strategies.
    Appears in Collections:[Executive Master of Information Management] Electronic Thesis & Dissertation

    Files in This Item:

    File Description SizeFormat
    index.html0KbHTML35View/Open


    All items in NCUIR are protected by copyright, with all rights reserved.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明