分析基因體拷貝數變異所使用的兩種方法比較：隱藏馬可夫模型與成對高斯合併法; A Comparison of Genome Copy Number Variation Analysis using two Methods: Hidden Markov Model and Pair-wise Gaussian Merging

NCUIR > college of Health Sciences and Technology > Institute of Systems Biology and Bioinformatics > Electronic Thesis & Dissertation > Item 987654321/43806

Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/43806

Title:	分析基因體拷貝數變異所使用的兩種方法比較：隱藏馬可夫模型與成對高斯合併法;A Comparison of Genome Copy Number Variation Analysis using two Methods: Hidden Markov Model and Pair-wise Gaussian Merging
Authors:	鄭主佑;Chu-yu Cheng
Contributors:	系統生物與生物資訊研究所
Keywords:	成對高斯合併法;老鼠全基因體;高斯合併;拷貝數;基因體拷貝數變異分析;拷貝數變異;隱藏馬可夫模型;Copy Number Variation;PGM;whole genome;Gaussian Merging;HMM;Copy Number;whole genome CNV analysis;mouse;CNV;Pair-wise Gaussian Merging;Hidden Markov Model
Date:	2010-07-13
Issue Date:	2010-12-08 14:21:33 (UTC+8)
Publisher:	國立中央大學
Abstract:	全基因體的拷貝數變異，自全人類基因體定序計畫完成之後便已漸漸被注意及探討。其中，以老鼠為模型的實驗有完善的微陣列晶片數據和明確的品種間基因拷貝數差異性。利用兩種不同的演算法─隱藏馬可夫模型以及成對高斯合併法─來判定老鼠全基因體拷貝數位置，我們發現這兩者判定的結果，無論在拷貝數變異區段之長度、位置、或是數量上，都有非常顯著的差異。我們認為原因是：兩種演算法背後有著截然不同的統計理論支持，導致判定區段時的策略不同。成對高斯合併法判定的拷貝數變異區段相對於隱藏馬可夫模型判定的結果來說，有較廣的區段長度分布，也有較多的區段個數。但是我們發現將兩者過短的區段忽略不看之後，判定的總區段數量便會幾乎相同。未來，我們也可以將這兩種演算法預測的結果拿來做進一步的比較，找出相同或相異的基因名稱及其註解；或甚至與更多不同的演算法比較。除了探討各種演算法的計算速度與硬體消耗程度之外，也可以套用在分析老鼠全基因體拷貝數變異的研究上。 Whole genome copy number variation (CNV) has been noticed and the related studies grew in amount since the completion of Human Genome Project (HGP). Those experiments using mouse as a biological model present a complete microarray data and clear CNV diversities between different strains. Applying two different algorithms, Hidden Markov Model (HMM) & Pair-wised Gaussian Merging (PGM), to determine the mouse genome-wide CNV segment, we found that the results are significantly different on CNV length, CNV location, and the number of CNV segments. We thought the reason might be: The two underlying statistical theories are quite different, leading to the different decision-making patterns of finding CNV segments. The distribution of the length of CNV segment determined by PGM is wider than those determined by HMM. However, after filtering the shorter CNV segments, the total number of results generated by these two algorithms became almost the same. So we can do further study on the data generated by HMM & PGM, such as finding out the CNV segments that only appeared in one of their results and checking the gene symbols or gene annotations. Besides the comparison of the calculating speed and space requirement between these algorithms, we can even applying them on the analysis of mouse whole-genome CNV.
Appears in Collections:	[Institute of Systems Biology and Bioinformatics] Electronic Thesis & Dissertation

Files in This Item:

File	Description	Size	Format
index.html		0Kb	HTML	934	View/Open

社群 sharing

Loading...