• 沒有找到結果。

未來研究方向

第五章 結論與未來研究方向

5.2 未來研究方向

雖然我們已經得到了不錯的評分函式可以用來評比不同的調控模組,然而,

在我們所發展的系統中卻沒有針對不同長度以及種類的調控序列來做搜尋。其中 最為主要的原因是,若加入不同長度以及種類的調控序列等變因,依照目前的 SAMLA 必定無法於合理的時間(多項式時間,Polynomial Time Complexity)中 來完成搜尋。因此未來將會朝向發展能夠更全方面地探尋調控模組的程式以及演 算法,並且能夠在合理的時間中來完成搜尋。

目前 SAMLA 描述模組中各調控序列之間間距的方式,是以一個十分簡單 的機率模型來描繪調控序列之間的距離。隨著生物資訊上的新發現,我們可以融 匯這些新訊息來調整間距模型,以期能夠更為精確的描述調控序列之間的間距關 係。

此外,改用更為精確的背景模型亦是一種改進的方式。經由第三章的介紹之 後可以發現我們所利用的背景模型是零階的馬可夫模型(Zero-order Markov Model)。在【Thijs G et. al. 2001, 2002】【Sinha, S. et. al. 2000】研究中提出了 改進背景模型為更高階的馬可夫模型可以增加系統預測的能力。使用高階馬可夫 模型的背景模型亦可增進系統分辨不同調控模組之間的差距,改進系統的預測能 力。

參考文獻

1. Agrawed,R. and Srikant,R. (1994) Fast algorithms for mining association rules.

Proceedings 199J International Conference VLDB. 487-499.

2. Bailey,T.L. and Elkan,C. (1994) Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology. 28-36.

3. Bailey,T.L. and Elkan,C. (1995) Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Mach. Learn., 21, 51.

4. Crooks,G.E., Hon,G., Chandonia,J.M., and Brenner,S.E. (2004) WebLogo: A sequence logo generator. Genome Research, 14, 1188-1190.

5. Day,W.H. and McMorris,F.R. (1993) The computation of consensus patterns in DNA sequence. Math. Comput. Model., 17, 49-52.

6. GuhaThakurta,D. and Stormo,G.D. (2001) Identifying target sites for cooperatively binding factors. Bioinformatics, 17, 608-621.

7. Favorov,A.V., Gelfand,M.S., Gerasimova1,A.V., Ravcheev,D.A., Mironov, A.A., and Makeev,V.J. (2004) A Gibbs sampler for identification of symmetrically

8. Gusfield. (1997) Algorithms on strings, trees and sequences. Cambridge University Press.

9. Hu,Y., Sandmeyer,S., McLaughlin,C., and Kibler,D. (2000) Combinatorial motif analysis and hypothesis generation on a genomic scale. Bioinformatics, 16, 222-232.

10. Hu,Y. (2003) Finding subtle motifs with variable gaps in unaligned DNA sequences. Computer Methods and Programs in Biomedicine, 70, 11-20.

11. Grayson,J., Bassel-Duby,R., and Williams,R.S. (1998) Collaborative Interactions Between MEF-2 and Sp1 in Muscle-Specific Gene Regulation. Journal of Cellular Biochemistry, 70, 366-375.

12. Jensen,S.T., Liu,X.S., Zhou,Q. and Liu,J.S. (2004). Computational discovery of gene regulatory binding motifs: a Bayesian perspective. Statistical Science 19:188-204.

13. Kirkpatrick,S., Gelatt,Jr., C.D., and Vecchi,M.P.. (1983) Optimization by Simulated Annealing. Science, 220, 671-680.

14. Lawrence,C.E., Altshul,S.F., Boguski,M.S., Liu,J.S., Neuwald,A.F. and Wootton,J.C. (1993) Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science, 262, 208-214.

15. Liu,J.S. (1994) The collapsed Gibbs Sampler in Bayesian computations with applications to a gene regulatory problem. J. Amer. Statist. Assoc. 89, 958-966.

16. Liu,J.S., Neuwald,A.F. and Lawrence,C.E. (1995) Bayesian models for multiple local sequence alignment and Gibbs sampling strategies. J. Amer. Statist. Assoc.

90, 1156-1170.

17. Liu X, Brutlag,D.L., and Liu,J.S. (2001) BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. Pac Symp Biocomput, 127-38.

18. Mitchell, Tom M.. (1997) Machine Learning. McGraw-Hill.

19. Robison,K., McGuire,A.M., and Church,G.M. (1998) A comprehensive library of DNA-binding site matrices for 55 proteins applied to the complete Escherichia coli K12 genome. Journal of Molecular Biology, 284, 241-254.

20. Sinha,S. and Tompa,M. (2000) A statistical method for finding transcription factor binding sites. In Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, AAAI Press, 344-354.

21. Thijs,G.., Lescot,M., Marchal,K., Rombauts,S., De Moor, B., Rouze, P., and Moreau, Y. (2001) A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling. Bioinformatics, 17, 1113-1122.

Moreau Y. (2002) A Gibbs sampling method to detect overrepresented motifs in the upstream regions of coexpressed genes. Journal of Computational Biology, Vol. 9, No. 2: 447-464.

23. van Helden, J., Andre, B, and Collado-Vides, J. (1998) Extracting Regulatory Sites from the Upstream Region of Yeast Genes by Computational Analysis of Oligonucleotide Frequencies. Journal of Molecular Biology, 281, 827-842.

24. van Helden, J., Rios, A. and Collado-Vides, J. (2000) Discovering regulatory elements in non-coding sequences by analysis of spaced dyads. Nucleic Acids Res., 28, 1808-1818.

25. Zhu, J. and Zhang, M.Q. (1999) SCPD: A promoter database of the yeast Saccharomyces cerevisiae. Bioinformatics, 15, 607-611.

A. IUPAC對照表

Code Description

A Adenine C Cytosine G Guanine T Thymine R Purine (A or G)

Y Pyrimidine (C, T, or U) M C or A

K T or G W T or A S C or G B C, T or G D A, T or G H A, T or C V A, C or G

N any base (A, C, G or T)

核甘酸的 IUPAC Code 對照表

B. Precision and Sensitivity

首先定義三個名詞:

A. TP 表示程式所預測的答案中,實際上正確的預測數目。

B. FP 表示程式所預測的答案中,實際上錯誤的預測數目。

C. RA 表示真實正確的答案數目。

,接下來精確率(Precision)以及涵蓋率(Sensitivity)便可依此來計算:

Precision =

FP TP

TP + 。

Sensitivity = RA TP

F-Socre =

⎟⎟⎠

⎜⎜ ⎞

⎛ +

y Sensitivit ecision

1 Pr

1 2

1

1

相關文件