基於機器學習方法的抗微生物肽活性預測 及特徵分析;Antimicrobial Peptide Activity Prediction and Feature Analysis Based on Machine Learning Approaches

NCU Institutional Repository > 資訊電機學院 > 資訊工程研究所 > 博碩士論文 > Item 987654321/89825

請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/89825

題名:	基於機器學習方法的抗微生物肽活性預測及特徵分析;Antimicrobial Peptide Activity Prediction and Feature Analysis Based on Machine Learning Approaches
作者:	林羽宣;Lin, Yu-Hsuan
貢獻者:	資訊工程學系
關鍵詞:	抗微生物肽;機器學習;最小抑制濃度;活性預測;antimicrobial peptides;machine learning;minimum inhibitory concentration;activity prediction
日期:	2022-07-27
上傳時間:	2022-10-04 12:01:13 (UTC+8)
出版者:	國立中央大學
摘要:	抗微生物肽(Antimicrobial peptides, AMPs)是很小的蛋白質，對細菌、真菌、寄生蟲和病毒等具有廣泛的抑制作用。因此，抗微生物肽成為一種新型抗感染藥物。近年來已有許多抗微生物肽相關研究，但鮮少研究分析影響其活性之重要特徵。根據微生物學，最小抑菌濃度(Minimum Inhibitory Concentration, MIC)是指可以抑制細菌生長的最小濃度，是評估藥物活性的重要指標。然而，最小抑菌濃度之實驗變異程度高，因此本研究主要目的為建構抗微生物肽之最小抑菌濃度是否低於 25μM 之模型，同時進一步探究影響活性預測之重要特徵。我們利用隨機森林(Random Forest，RF)、梯度提升(Gradient Boosting，GB)、輕量化梯度提升機(Light Gradient Boosting Machine，LightGBM) 和極限梯度提升(Extreme Gradient Boosting，XGBoost)及一個整合上述演算法而成集成模型等五種演算法，並使用四類特徵：胺基酸組成(Amino Acid Composition，AAC)、偽胺基酸組成(Pseudo amino acid composition，PAAC)、組合轉換分佈的分佈描述(Distribution descriptor of composition transition distribution，CTDD)和 k 間隔胺基酸對組成(Composition of k-spaced amino acid pairs，CKSAAP)，分別建構抗微生物肽對大腸桿菌(Escherichia coli, E. coli)和金黃色葡萄球菌(Staphylococcus aureus, S. aureus)活性高低預測模型，並使用前向選擇演算法尋找重要特徵。結果顯示，採用集成模型可達到較好的表現，對大腸桿菌和金黃色葡萄球菌分別可達 0.8608 和 0.8308 之接受者操作特徵曲線下面積(Area under the ROC curve，AUC)。特徵分析的結果顯示，胺基酸組成、偽胺基酸組成和組合轉換分佈的分佈描述此三類特徵是較為重要的。因此，本研究表明利用集成四種機器學習演算法以及胺基酸組成、偽胺基酸組成、組合轉換分佈的分佈描述等特徵，可有效預測抗微生物肽對大腸桿菌或是金黃色葡萄球菌之活性高低。;Antimicrobial peptides (AMPs) are small proteins that have broad inhibitory effects on bacteria, fungi, parasites, and viruses. Thus, antimicrobial peptides become a new type of anti-infective drugs. There have been many studies on antimicrobial peptides in recent years, but few studies have analyzed the important characteristics that affect their activity. According to microbiology, minimum inhibitory concentration (MIC) refers to the minimum concentration that can inhibit the growth of bacteria, which is an important indicator for evaluating drug activity. However, the degree of experimental variability of the minimum inhibitory concentration is high, so the main purpose of this study was to construct a model of whether the minimum inhibitory concentration of the antimicrobial peptide was lower than 25 μM, and to further explore the important features affecting the activity prediction. We used Random Forest (RF)、Gradient Boosting (GB)、Light Gradient Boosting Machine (LightGBM), Extreme Gradient Boosting (XGBoost) and an ensemble model that integrates the above algorithms. Four types of features were used: Amino Acid Composition (AAC), Pseudo amino acid composition (PAAC), Distribution descriptor of composition transition distribution (CTDD) and Composition of k spaced amino acid pairs (CKSAAP). Constructing predictive models of antimicrobial peptide activity against Escherichia coli (E. coli) and Staphylococcus aureus (S. aureus) respectively and use forward selection to find important features. The results showed that the ensemble model could achieve better performance, and AUC for E. coli and S. aureus was 0.8608 and 0.8308. Feature analysis showed that AAC, PAAC and CTDD were important. Thus, this study indicated the use of ensemble model and the features of AAC, PAAC and CTDD can effectively predict the effect of antimicrobial peptides against E. coli or S. aureus.
顯示於類別:	[資訊工程研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	50	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....