David Sankoff

(1)

第 1 組

R02922113 謝名宣 R02922064 黃宥勝 R02922077 黃志恒 R02945040 王亮之

The early introduction of dynamic program

ming into computational biology

(2)

Reference

The early introduction of dynamic pro gramming into computational biology.

David Sankoff

2000 Bioinformatics, 16 , 41-47 .

(3)

Outline



Introduction



Dynamic programming for sequence co mparison



Multiple alignment and phylogeny



Secondary structure

(4)

Introduction

(5)

Introduction of writer

 David Sankoff

David Sankoff currently holds the Canada R esearch Chair in Mathematical Genomics at the University of Ottawa.

He studied at McGill University, doing a P hD in Probability Theory with Donald Dawso n. He joined the new Centre de recherches mathématiques (CRM) of the University of M ontreal in 1969 and was also a professor i n the Mathematics and Statistics Departmen t from 1984–2002.

He is one of the founding fathers of bioin formatics whose fundamental contributions to the area go back to the early 1970s.

(6)

Introduction of writer

In 1971, Cedergren asked Sank off to find a way to align RNA sequences. Sankoff knew little of algorithm design and nothin g of discrete dynamic programm ing, but as an undergraduate h e had effectively used the lat ter in working out an economic s problem matching buyers and sellers. The same approach wor ked with alignment.

(7)

Introduction

In 1994-1995, DIMACs sponsored a theme ye ar on computational biology.

As a participation in a workshop which or ganized by Alberto Apostolico and Raffael e Giancarlo, Sankoff led to consider some of the early interactions in the field no w known as computational biology.

After reading a paper by Walter Goad on t he impact of Stanislaw Ulam in this field , puzzled Sankoff greatly.

(8)

Introduction

Ulam: ’I started

all this’.

(9)

Introduction

Sankoff had also read a joint interview of U lam and Mark Kac, led him to reflect on this misperception on the p art of Ulam, and to cr ystallize the realizat ion that ironically, K ac, his colleague of m any year, had play a c rucial in the earliest development of the fie ld.

This paper is dedicated to the memory of

Mark Kac.

(10)

Introduction

In this article, Sankoff will draw on his rec ollections of the earliest phases of the fiel d to describe how certain fundamental ideas f ound their ways into the vernacular of the co mputational biologist.

(11)

Dynamic programming for sequence

comparison

(12)

Longest common subsequence

 Maximum matching

 Recurrence relation

 2 sequence of length m and n of terms from any alphabet

 The 2 sequences are a(1),…,a(m) and b(1),…,b (n)

 Use for the prefix sequence

 a(1),…,a(i) and M(i,j) for the longest common subsequence of and



(13)

Longest common subsequence

(14)

Longest common subsequence

 initial condition

 M(i,0) = M(0,j) = 0

 The length of the longest common subsequence

 M(m,n)

 All longest common subsequence

 Traceback routine on the matrix M

(15)

Edit distance

 Stanislaw Ulam

 Sequence comparison problem

 Dynamic Programming-Sellers

 Maximum matching

 Cubic computing time

 Can be done in quadratic time

(16)

Edit distance

 Using D(i,j) for the minimum number of steps to convert to



(17)

Edit distance

 D(i,0) = D(0,i) = i

 The edit distance between the two sequences

 D(m,n)

 All appropriate sets of edit steps

 Traceback routine on the matrix D

(18)

Generalization

 A different weight s>0

 The longest common subsequence problem and the shortest edit distance problem become es sentiall identical

 When s≧2

(19)

Optimal local alignment

 Smith and Waterman

 Dynamic Programming

 Simple

 Not-obvious

(20)

Optimal local alignment

(21)

Optimal local alignment

 L(I,0)=L(0,i)=0

 The score of the optimal local alignment bet ween the two sequences

 L(i,j)

 All appropriate sets of edit steps

 Traceback routine on the matrix L



(22)

Multiple alignment and phylogeny

(23)

Multiple alignment and phylog eny

 Cedergren and Sankoff became interested in a ssessing the relative rates of 12 possible s ubstitution mutations among the four based {A,C,G,U}

 Idea:

 Isolate each position in the RNA

 Count the number of mutations

 Combine the data of all positions

(24)

Multiple alignment and phylog eny

 The only task:

 Align corresponding positions in all sequences

 Count the number of mutations in all positions



(25)

Multiple alignment and phylog eny

 Sankoff published a short paper with Cedergr en and his student Cristiane Morel (Sankoff et al., 1973)

 Significant of the paper

 Mutation frequencies

 Reconstruction of the ancestral sequence

 Formal algorithm for multiple sequence alignme nt

(26)

Multiple alignment and phylog eny

 Sankoff rushed off a manuscript containing t his algorithm to Mark Kac, and requested him to communicate it to PNAS

 After waiting for 6 month for a reply from K ac…

 Not good enough for some cases

 Should optimize the tree topology simultaneous ly

 Published his algorithm elsewhere (Sankoff, 1975)

(27)

Secondary structure

(28)

Secondary structure

 Stem

Given two regions ： a(i),…..,a(i+h)

a(j),…..,a(j-h) For h=0,……k

a(i+1) a(i) a(i+2) a(i+3) a(i+4) a(i+5)

a(j-1) a(j)

a(j-2)

a(j-3)

a(j-4) a(j-5)

(29)

Secondary structure



R -loops

Given a(i

^r

),….., a(k

^r

) are all unpaired For r =1,….., R



R = 1 = Hairpin



R = 2 = Interior loop



R ≥ 3 = Multiple loop



Special case

ex:bugle (R=2)

R=1

R=2

a(i

¹

)

a(k

1

)

a(i

²

)

a(k

2

)

(30)

Secondary structure



R -loops

Given a(i

^r

),….., a(k

^r

) are all unpaired For r =1,….., R



R = 1 = Hairpin



R = 2 = Interior loop



R ≥ 3 = Multiple loop



Special case

ex:bugle (R=2)

(31)

Secondary structure



Secondary struture stems disrupted o nly by bugles and other interior loo ps could be detected by dynamic prog ramming comparison.



Sankoff devised an iterative algorit

hem.But the method turn out to be ve

ry dependent on the crude energy est

imates at the time.(1976)

(32)

Secondary structure

 A single-pass dynamic programming algorith m was published by Ruth Nussinov. (1978)

 Michael Zuker wrote a very effective and w ide disseminated program based on the Nuss inov’s principle. (1981)

 Mark Kac invited him to gaive a talk at Ro ckefeller University and re-stimulated his interest in secondary structure.

(33)

Secondary structure

