中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/74799
English  |  正體中文  |  简体中文  |  Items with full text/Total items : 80990/80990 (100%)
Visitors : 42701277      Online Users : 1372
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version


    Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/74799


    Title: 特徵屬性篩選對於不同資料類型之影響
    Authors: 歐先弘;Leo, Hsien-Hung
    Contributors: 資訊管理學系在職專班
    Keywords: 資料探勘;特徵屬性篩選;分類演算法;Data Mining;Feature Selected;Classification Algorithm
    Date: 2017-08-21
    Issue Date: 2017-10-27 14:39:41 (UTC+8)
    Publisher: 國立中央大學
    Abstract: 特徵屬性篩選(Feature Selection)在資料探勘裡,是很重要的資料前處理步驟,主要目的是希望在給定一個資料集時,可以透過特徵選取技術來去除不相關或是冗餘的特徵值,從目前現有相關文獻中,並沒有針對每一類特徵屬性篩選,與三種不同的資料類型(數值型、離散型、混合型)進行實驗,因此本研究選定了三種特徵屬性篩選技術:資訊獲利(Information Gain, GA)、基因演算法(Genetic Algorithm, GA)、決策樹(Decision Tree, DT),探討在這三種類型的未篩選與特徵屬性篩選下,在不同類型的資料集當中的分類表現,從UCI取得真實世界不同領域的40個資料集,實驗結果會在分類器:支持向量機 (Support Vector Machines, SVM)、最近鄰居法(K-Nearest Neighbor, KNN)、決策樹(Decision Tree, DT)、類神經網路(Artificial Neural Network, ANN)、AdaBoost、Bagging上進行驗證,希望透過正確率表現,探討出哪種特性的資料集透過哪種特徵屬性篩選,會提升某分類器演算法的效能,做為分析人員在進行實驗時的參考。
    依據研究所得之結果,離型散資料不論使用哪一種單一分類器或是Adaboost的分類演算法,其基準正確率表現最佳,建議不需再進行特徵屬性篩選步驟;離散型資料使用Bagging多重分類器下選擇KNN分類器,經過DT特徵屬性篩選演算法後,其正確率會較執行其它演算法較佳;混合型資料除了IG特徵屬性篩選演算法,透過GA或是DT 特徵屬性篩選演算法,其正確率會比基準較佳;數值型資料中除了GA特徵屬性篩選演算法,透過GA或是DT 特徵屬性篩選演算法,其正確率會比基準較佳;數值型資料在MLP的基準正確率表現最佳,建議不需再進行特徵屬性篩選步驟。針對不同資料類型,在選定分類器之後,可參考本研究挑選正確率最佳的特徵屬性篩選方法優先進行。;Feature selection is an important process for pattern recognition applications. The purpose of feature selection is to avoid classifier’s performance degradation. The removed feature(s) must be redundant, irrelevant, or of the least possible use. There is no related study which compares different feature selection methods with different data types, such as categorical, numerical, and mixed-type of datasets for classification performance. Therefore, in this thesis, three major feature selection methods were chosen, which are Information Gain (IG), Genetic Algorithm (GA) and Decision Tree (DT), and the research aim is to compare the classification accuracy of using these feature selection methods over different types of datasets. We illustrate the capability of the result by extensive experiments on analyzing 40 real-world datasets from UCI. In addition, six different classification techniques are compared, including Support Vector Machines (SVM), K-Nearest Neighbor (KNN), Decision Tree (DT), Artificial Neural Network (ANN), AdaBoost and Bagging.
    The experimental results show that the need for feature selection over categorical datasets is not strong. However, bagging based KNN and DT could increase the performance. For the mixed-type and numerical datasets, using GA and DT perform better. Particularly, if MLP is used, there is no need to do the feature selection process for numerical datasets. We demonstrate that different feature selection methods could increase the accuracy of some classification models.
    Appears in Collections:[Executive Master of Information Management] Electronic Thesis & Dissertation

    Files in This Item:

    File Description SizeFormat
    index.html0KbHTML350View/Open


    All items in NCUIR are protected by copyright, with all rights reserved.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明