中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/8764
English  |  正體中文  |  简体中文  |  Items with full text/Total items : 80990/80990 (100%)
Visitors : 42708687      Online Users : 1490
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version


    Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/8764


    Title: 利用峰點特徵值來分析高解析度蛋白質質譜資料;Analysis of high-resolution protein mass spectra based on peak feature selection
    Authors: 陳聰百;Tsung-Pai Chen
    Contributors: 資訊工程學系碩士在職專班
    Keywords: 質譜校準;峰點偵測;質譜儀;分類預測;基線校正;feature selection;SELDI-TOF;MALDI--TOF;classification;peak detection
    Date: 2006-06-15
    Issue Date: 2009-09-22 11:34:23 (UTC+8)
    Publisher: 國立中央大學圖書館
    Abstract: 表面強化雷射解析電離飛行質譜(SELDI-TOF)及基質輔助雷射脫附游離法飛行時間質譜(MALDI-TOF)技術是目前使用於辨識生物標記的技術。本論文是使用來自美國國家癌症研究協會的SELDI-TOF卵巢癌資料集,與來自長庚大學的MALDI-TOF口腔癌資料集。樣本皆區分為控制組及癌症病患組。我們的研究目標是縮減質譜的高維度並從中擷取出有意義的特徵峰點。抽取特徵的方法諸如基線校正、峰點偵測、質譜校準等。特徵選取則利用 Kolmogorov-Smirnov檢定(KS 檢定)、Logistic Regression(邏輯斯迴歸)和Random Forest 等方法。有鑑別力的特徵被挑選出來之後再應用三種分類方法來針對資料集做分類預測。 我們分別挑選了50個和100個最有鑑別力的特徵峰點來做1000次重複隨機性地10-fold 交叉驗證,並利用regression tree with bagging(迴歸樹), k-nearest neighbor(k 個最近鄰居)及SVM(支持向量機)等分類方法所得到的靈敏度(Sensitivity)、特異度(Specificity)、準確度(Accuracy)、精準度(Precision)皆有不錯的分類效果。同時我們也開發了一個質譜相關性查詢系統,去辨識在癌症及非癌症族群有高度相關的峰點值。在此我們提出的分析流程可以提供一個相對較小的特徵峰點資料集,該資料集具有足夠識別力來進行分類預測及相關性分析的研究。 The SELDI-TOF and MALDI-TOF process are the currently used techniques to identify biomarkers for cancers. Our work has focused on the ovarian cancer dataset that is generated by SELDI-TOF technique from National Cancer Institute, USA. Another study set is the oral cancer dataset that is generated by MALDI-TOF technique from Proteomics Center of Chang Gung University, Taiwan. The aim of this work is to reduce the high dimensionality of the mass spectra and extract the significant peak-features for further study. The methods used such as baseline subtraction, peak detection, spectra alignment and normalization are used for feature extraction. Kolmogorov-Smirnov test, logistic regression and random forest are used for feature selection. After feature selection, discriminatory peak-features are selected and three methods had applied to classify the two classes of the ovarian cancer datasets. The selected 50 and 100 most discriminatory peak-features were applied to do classification with 1000 replications using 10-fold proportional validation independently. The results yielded good accuracy, precision, sensitivity and specificity respectively, by regression tree with bagging, k-nearest neighbor and SVM classifier. We also develop a correlation based query system to identify the highly correlated peaks of cancer and non-cancer groups. The analysis pipeline that we proposed could provide a relatively small peak-feature set that is discriminatory enough for classification and correlation based studies.
    Appears in Collections:[Executive Master of Computer Science and Information Engineering] Electronic Thesis & Dissertation

    Files in This Item:

    File SizeFormat


    All items in NCUIR are protected by copyright, with all rights reserved.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明