第 1 組

R02922113 謝名宣 R02922064 黃宥勝 R02922077 黃志恒 R02945040 王亮之

### The early introduction of dynamic program

### ming into computational biology

### Reference

### The early introduction of dynamic pro gramming into computational biology.

### David Sankoff

### 2000 Bioinformatics, 16 , 41-47 .

### Outline

### Introduction

### Dynamic programming for sequence co mparison

### Multiple alignment and phylogeny

### Secondary structure

### Introduction

### Introduction of writer

David Sankoff

David Sankoff currently holds the Canada R esearch Chair in Mathematical Genomics at the University of Ottawa.

He studied at McGill University, doing a P hD in Probability Theory with Donald Dawso n. He joined the new Centre de recherches mathématiques (CRM) of the University of M ontreal in 1969 and was also a professor i n the Mathematics and Statistics Departmen t from 1984–2002.

He is one of the founding fathers of bioin formatics whose fundamental contributions to the area go back to the early 1970s.

### Introduction of writer

In 1971, Cedergren asked Sank off to find a way to align RNA sequences. Sankoff knew little of algorithm design and nothin g of discrete dynamic programm ing, but as an undergraduate h e had effectively used the lat ter in working out an economic s problem matching buyers and sellers. The same approach wor ked with alignment.

### Introduction

In 1994-1995, DIMACs sponsored a theme ye ar on computational biology.

As a participation in a workshop which or ganized by Alberto Apostolico and Raffael e Giancarlo, Sankoff led to consider some of the early interactions in the field no w known as computational biology.

After reading a paper by Walter Goad on t he impact of Stanislaw Ulam in this field , puzzled Sankoff greatly.

### Introduction

## Ulam: ’I started

## all this’.

### Introduction

Sankoff had also read a joint interview of U lam and Mark Kac, led him to reflect on this misperception on the p art of Ulam, and to cr ystallize the realizat ion that ironically, K ac, his colleague of m any year, had play a c rucial in the earliest development of the fie ld.

### This paper is dedicated to the memory of

### Mark Kac.

### Introduction

In this article, Sankoff will draw on his rec ollections of the earliest phases of the fiel d to describe how certain fundamental ideas f ound their ways into the vernacular of the co mputational biologist.

### Dynamic programming for sequence

### comparison

### Longest common subsequence

Maximum matching

Recurrence relation

2 sequence of length m and n of terms from any alphabet

The 2 sequences are a(1),…,a(m) and b(1),…,b (n)

Use for the prefix sequence

a(1),…,a(i) and M(i,j) for the longest common subsequence of and

### Longest common subsequence

### Longest common subsequence

initial condition

M(i,0) = M(0,j) = 0

The length of the longest common subsequence

M(m,n)

All longest common subsequence

Traceback routine on the matrix M

### Edit distance

Stanislaw Ulam

Sequence comparison problem

Dynamic Programming-Sellers

Maximum matching

Cubic computing time

Can be done in quadratic time

### Edit distance

Using D(i,j) for the minimum number of steps to convert to

### Edit distance

initial condition

D(i,0) = D(0,i) = i

The edit distance between the two sequences

D(m,n)

All appropriate sets of edit steps

Traceback routine on the matrix D

### Generalization

A different weight s>0

The longest common subsequence problem and the shortest edit distance problem become es sentiall identical

When s≧2

### Optimal local alignment

Smith and Waterman

Dynamic Programming

Simple

Not-obvious

### Optimal local alignment

### Optimal local alignment

initial condition

L(I,0)=L(0,i)=0

The score of the optimal local alignment bet ween the two sequences

L(i,j)

All appropriate sets of edit steps

Traceback routine on the matrix L

### Multiple alignment and phylogeny

### Multiple alignment and phylog eny

Cedergren and Sankoff became interested in a ssessing the relative rates of 12 possible s ubstitution mutations among the four based {A,C,G,U}

Idea:

Isolate each position in the RNA

Count the number of mutations

Combine the data of all positions

### Multiple alignment and phylog eny

The only task:

Align corresponding positions in all sequences

Count the number of mutations in all positions

### Multiple alignment and phylog eny

Sankoff published a short paper with Cedergr en and his student Cristiane Morel (Sankoff et al., 1973)

Significant of the paper

Mutation frequencies

Reconstruction of the ancestral sequence

Formal algorithm for multiple sequence alignme nt

### Multiple alignment and phylog eny

Sankoff rushed off a manuscript containing t his algorithm to Mark Kac, and requested him to communicate it to PNAS

After waiting for 6 month for a reply from K ac…

Not good enough for some cases

Should optimize the tree topology simultaneous ly

Published his algorithm elsewhere (Sankoff, 1975)

### Secondary structure

### Secondary structure

### Stem

### Given two regions ： a(i),…..,a(i+h)

### a(j),…..,a(j-h) For h=0,……k

*a(i+1)* *a(i)* *a(i+2)* *a(i+3)* *a(i+4)* *a(i+5)*

*a(j-1)* *a(j)*

*a(j-2)*

*a(j-3)*

*a(j-4)* *a(j-5)*

### Secondary structure

### R -loops

### Given a(i

^{r}

### ),….., a(k

^{r}

### ) are all unpaired For r =1,….., R

### R = 1 = Hairpin

### R = 2 = Interior loop

### R ≥ 3 = Multiple loop

### Special case

### ex:bugle (R=2)

*R=1*

*R=2*

*a(i*

^{1}*)*

*a(k*

*1*

*)*

*a(i*

^{2}*)*

*a(k*

*2*

*)*

### Secondary structure

### R -loops

### Given a(i

^{r}

### ),….., a(k

^{r}

### ) are all unpaired For r =1,….., R

### R = 1 = Hairpin

### R = 2 = Interior loop

### R ≥ 3 = Multiple loop

### Special case

### ex:bugle (R=2)

### Secondary structure

### Secondary struture stems disrupted o nly by bugles and other interior loo ps could be detected by dynamic prog ramming comparison.

### Sankoff devised an iterative algorit

### hem.But the method turn out to be ve

### ry dependent on the crude energy est

### imates at the time.(1976)

### Secondary structure

A single-pass dynamic programming algorith m was published by Ruth Nussinov. (1978)

Michael Zuker wrote a very effective and w ide disseminated program based on the Nuss inov’s principle. (1981)

Mark Kac invited him to gaive a talk at Ro ckefeller University and re-stimulated his interest in secondary structure.

### Secondary structure

### The dynamic programming recurrence funda mental to folding may be represented as: