第 1 組
R02922113 謝名宣 R02922064 黃宥勝 R02922077 黃志恒 R02945040 王亮之
The early introduction of dynamic program
ming into computational biology
Reference
The early introduction of dynamic pro gramming into computational biology.
David Sankoff
2000 Bioinformatics, 16 , 41-47 .
Outline
Introduction
Dynamic programming for sequence co mparison
Multiple alignment and phylogeny
Secondary structure
Introduction
Introduction of writer
David Sankoff
David Sankoff currently holds the Canada R esearch Chair in Mathematical Genomics at the University of Ottawa.
He studied at McGill University, doing a P hD in Probability Theory with Donald Dawso n. He joined the new Centre de recherches mathématiques (CRM) of the University of M ontreal in 1969 and was also a professor i n the Mathematics and Statistics Departmen t from 1984–2002.
He is one of the founding fathers of bioin formatics whose fundamental contributions to the area go back to the early 1970s.
Introduction of writer
In 1971, Cedergren asked Sank off to find a way to align RNA sequences. Sankoff knew little of algorithm design and nothin g of discrete dynamic programm ing, but as an undergraduate h e had effectively used the lat ter in working out an economic s problem matching buyers and sellers. The same approach wor ked with alignment.
Introduction
In 1994-1995, DIMACs sponsored a theme ye ar on computational biology.
As a participation in a workshop which or ganized by Alberto Apostolico and Raffael e Giancarlo, Sankoff led to consider some of the early interactions in the field no w known as computational biology.
After reading a paper by Walter Goad on t he impact of Stanislaw Ulam in this field , puzzled Sankoff greatly.
Introduction
Ulam: ’I started
all this’.
Introduction
Sankoff had also read a joint interview of U lam and Mark Kac, led him to reflect on this misperception on the p art of Ulam, and to cr ystallize the realizat ion that ironically, K ac, his colleague of m any year, had play a c rucial in the earliest development of the fie ld.
This paper is dedicated to the memory of
Mark Kac.
Introduction
In this article, Sankoff will draw on his rec ollections of the earliest phases of the fiel d to describe how certain fundamental ideas f ound their ways into the vernacular of the co mputational biologist.
Dynamic programming for sequence
comparison
Longest common subsequence
Maximum matching
Recurrence relation
2 sequence of length m and n of terms from any alphabet
The 2 sequences are a(1),…,a(m) and b(1),…,b (n)
Use for the prefix sequence
a(1),…,a(i) and M(i,j) for the longest common subsequence of and
Longest common subsequence
Longest common subsequence
initial condition
M(i,0) = M(0,j) = 0
The length of the longest common subsequence
M(m,n)
All longest common subsequence
Traceback routine on the matrix M
Edit distance
Stanislaw Ulam
Sequence comparison problem
Dynamic Programming-Sellers
Maximum matching
Cubic computing time
Can be done in quadratic time
Edit distance
Using D(i,j) for the minimum number of steps to convert to
Edit distance
initial condition
D(i,0) = D(0,i) = i
The edit distance between the two sequences
D(m,n)
All appropriate sets of edit steps
Traceback routine on the matrix D
Generalization
A different weight s>0
The longest common subsequence problem and the shortest edit distance problem become es sentiall identical
When s≧2
Optimal local alignment
Smith and Waterman
Dynamic Programming
Simple
Not-obvious
Optimal local alignment
Optimal local alignment
initial condition
L(I,0)=L(0,i)=0
The score of the optimal local alignment bet ween the two sequences
L(i,j)
All appropriate sets of edit steps
Traceback routine on the matrix L
Multiple alignment and phylogeny
Multiple alignment and phylog eny
Cedergren and Sankoff became interested in a ssessing the relative rates of 12 possible s ubstitution mutations among the four based {A,C,G,U}
Idea:
Isolate each position in the RNA
Count the number of mutations
Combine the data of all positions
Multiple alignment and phylog eny
The only task:
Align corresponding positions in all sequences
Count the number of mutations in all positions
Multiple alignment and phylog eny
Sankoff published a short paper with Cedergr en and his student Cristiane Morel (Sankoff et al., 1973)
Significant of the paper
Mutation frequencies
Reconstruction of the ancestral sequence
Formal algorithm for multiple sequence alignme nt
Multiple alignment and phylog eny
Sankoff rushed off a manuscript containing t his algorithm to Mark Kac, and requested him to communicate it to PNAS
After waiting for 6 month for a reply from K ac…
Not good enough for some cases
Should optimize the tree topology simultaneous ly
Published his algorithm elsewhere (Sankoff, 1975)
Secondary structure
Secondary structure
Stem
Given two regions : a(i),…..,a(i+h)
a(j),…..,a(j-h) For h=0,……k
a(i+1) a(i) a(i+2) a(i+3) a(i+4) a(i+5)
a(j-1) a(j)
a(j-2)
a(j-3)
a(j-4) a(j-5)
Secondary structure
R -loops
Given a(i
r),….., a(k
r) are all unpaired For r =1,….., R
R = 1 = Hairpin
R = 2 = Interior loop
R ≥ 3 = Multiple loop
Special case
ex:bugle (R=2)
R=1
R=2
a(i
1)
a(k
1
)
a(i
2)
a(k
2
)
Secondary structure
R -loops
Given a(i
r),….., a(k
r) are all unpaired For r =1,….., R
R = 1 = Hairpin
R = 2 = Interior loop
R ≥ 3 = Multiple loop
Special case
ex:bugle (R=2)
Secondary structure
Secondary struture stems disrupted o nly by bugles and other interior loo ps could be detected by dynamic prog ramming comparison.
Sankoff devised an iterative algorit
hem.But the method turn out to be ve
ry dependent on the crude energy est
imates at the time.(1976)
Secondary structure
A single-pass dynamic programming algorith m was published by Ruth Nussinov. (1978)
Michael Zuker wrote a very effective and w ide disseminated program based on the Nuss inov’s principle. (1981)
Mark Kac invited him to gaive a talk at Ro ckefeller University and re-stimulated his interest in secondary structure.
Secondary structure
The dynamic programming recurrence funda mental to folding may be represented as: