• 沒有找到結果。

David Sankoff

N/A
N/A
Protected

Academic year: 2022

Share "David Sankoff "

Copied!
34
0
0

加載中.... (立即查看全文)

全文

(1)

第 1 組

R02922113 謝名宣 R02922064 黃宥勝 R02922077 黃志恒 R02945040 王亮之

The early introduction of dynamic program

ming into computational biology

(2)

Reference

The early introduction of dynamic pro gramming into computational biology.

David Sankoff

2000 Bioinformatics, 16 , 41-47 .

(3)

Outline

Introduction

Dynamic programming for sequence co mparison

Multiple alignment and phylogeny

Secondary structure

(4)

Introduction

(5)

Introduction of writer

David Sankoff

David Sankoff currently holds the Canada R esearch Chair in Mathematical Genomics at the University of Ottawa.

He studied at McGill University, doing a P hD in Probability Theory with Donald Dawso n. He joined the new Centre de recherches mathématiques (CRM) of the University of M ontreal in 1969 and was also a professor i n the Mathematics and Statistics Departmen t from 1984–2002.

He is one of the founding fathers of bioin formatics whose fundamental contributions to the area go back to the early 1970s.

(6)

Introduction of writer

In 1971, Cedergren asked Sank off to find a way to align RNA sequences. Sankoff knew little of algorithm design and nothin g of discrete dynamic programm ing, but as an undergraduate h e had effectively used the lat ter in working out an economic s problem matching buyers and sellers. The same approach wor ked with alignment.

(7)

Introduction

In 1994-1995, DIMACs sponsored a theme ye ar on computational biology.

As a participation in a workshop which or ganized by Alberto Apostolico and Raffael e Giancarlo, Sankoff led to consider some of the early interactions in the field no w known as computational biology.

After reading a paper by Walter Goad on t he impact of Stanislaw Ulam in this field , puzzled Sankoff greatly.

(8)

Introduction

Ulam: ’I started

all this’.

(9)

Introduction

Sankoff had also read a joint interview of U lam and Mark Kac, led him to reflect on this misperception on the p art of Ulam, and to cr ystallize the realizat ion that ironically, K ac, his colleague of m any year, had play a c rucial in the earliest development of the fie ld.

This paper is dedicated to the memory of

Mark Kac.

(10)

Introduction

In this article, Sankoff will draw on his rec ollections of the earliest phases of the fiel d to describe how certain fundamental ideas f ound their ways into the vernacular of the co mputational biologist.

(11)

Dynamic programming for sequence

comparison

(12)

Longest common subsequence

Maximum matching

Recurrence relation

2 sequence of length m and n of terms from any alphabet

The 2 sequences are a(1),…,a(m) and b(1),…,b (n)

Use for the prefix sequence

a(1),…,a(i) and M(i,j) for the longest common subsequence of and

 

(13)

Longest common subsequence

(14)

Longest common subsequence

initial condition

M(i,0) = M(0,j) = 0

The length of the longest common subsequence

M(m,n)

All longest common subsequence

Traceback routine on the matrix M

(15)

Edit distance

Stanislaw Ulam

Sequence comparison problem

Dynamic Programming-Sellers

Maximum matching

Cubic computing time

Can be done in quadratic time

(16)

Edit distance

Using D(i,j) for the minimum number of steps to convert to

 

(17)

Edit distance

initial condition

D(i,0) = D(0,i) = i

The edit distance between the two sequences

D(m,n)

All appropriate sets of edit steps

Traceback routine on the matrix D

(18)

Generalization

A different weight s>0

The longest common subsequence problem and the shortest edit distance problem become es sentiall identical

When s≧2

(19)

Optimal local alignment

Smith and Waterman

Dynamic Programming

Simple

Not-obvious

(20)

Optimal local alignment

(21)

Optimal local alignment

initial condition

L(I,0)=L(0,i)=0

The score of the optimal local alignment bet ween the two sequences

L(i,j)

All appropriate sets of edit steps

Traceback routine on the matrix L

 

(22)

Multiple alignment and phylogeny

(23)

Multiple alignment and phylog eny

Cedergren and Sankoff became interested in a ssessing the relative rates of 12 possible s ubstitution mutations among the four based {A,C,G,U}

Idea:

Isolate each position in the RNA

Count the number of mutations

Combine the data of all positions

(24)

Multiple alignment and phylog eny

The only task:

Align corresponding positions in all sequences

Count the number of mutations in all positions

 

(25)

Multiple alignment and phylog eny

Sankoff published a short paper with Cedergr en and his student Cristiane Morel (Sankoff et al., 1973)

Significant of the paper

Mutation frequencies

Reconstruction of the ancestral sequence

Formal algorithm for multiple sequence alignme nt

(26)

Multiple alignment and phylog eny

Sankoff rushed off a manuscript containing t his algorithm to Mark Kac, and requested him to communicate it to PNAS

After waiting for 6 month for a reply from K ac…

Not good enough for some cases

Should optimize the tree topology simultaneous ly

Published his algorithm elsewhere (Sankoff, 1975)

(27)

Secondary structure

(28)

Secondary structure

 Stem

Given two regions : a(i),…..,a(i+h)

a(j),…..,a(j-h) For h=0,……k

a(i+1) a(i) a(i+2) a(i+3) a(i+4) a(i+5)

a(j-1) a(j)

a(j-2)

a(j-3)

a(j-4) a(j-5)

(29)

Secondary structure

R -loops

Given a(i

r

),….., a(k

r

) are all unpaired For r =1,….., R

R = 1 = Hairpin

R = 2 = Interior loop

R ≥ 3 = Multiple loop

Special case

ex:bugle (R=2)

R=1

R=2

a(i

1

)

a(k

1

)

a(i

2

)

a(k

2

)

(30)

Secondary structure

R -loops

Given a(i

r

),….., a(k

r

) are all unpaired For r =1,….., R

R = 1 = Hairpin

R = 2 = Interior loop

R ≥ 3 = Multiple loop

Special case

ex:bugle (R=2)

(31)

Secondary structure

Secondary struture stems disrupted o nly by bugles and other interior loo ps could be detected by dynamic prog ramming comparison.

Sankoff devised an iterative algorit

hem.But the method turn out to be ve

ry dependent on the crude energy est

imates at the time.(1976)

(32)

Secondary structure

A single-pass dynamic programming algorith m was published by Ruth Nussinov. (1978)

Michael Zuker wrote a very effective and w ide disseminated program based on the Nuss inov’s principle. (1981)

Mark Kac invited him to gaive a talk at Ro ckefeller University and re-stimulated his interest in secondary structure.

(33)

Secondary structure

The dynamic programming recurrence funda mental to folding may be represented as:

simutaneously solving the folding and mu

tiple alignment problems. (1985)

(34)

Thanks for your listening

參考文獻

相關文件

In summary, the main contribution of this paper is to propose a new family of smoothing functions and correct a flaw in an algorithm studied in [13], which is used to guarantee

The Hilbert space of an orbifold field theory [6] is decomposed into twisted sectors H g , that are labelled by the conjugacy classes [g] of the orbifold group, in our case

By this, the second-order cone complementarity problem (SOCCP) in H can be converted into an unconstrained smooth minimization problem involving this class of merit functions,

• An algorithm is any well-defined computational procedure that takes some value, or set of values, as input and produces some value, or set of values, as output.. • An algorithm is

– evolve the algorithm into an end-to-end system for ball detection and tracking of broadcast tennis video g. – analyze the tactics of players and winning-patterns, and hence

• Information retrieval : Implementing and Evaluating Search Engines, by Stefan Büttcher, Charles L.A.

Ramesh: An algorithm for generating all spann ing trees of directed graphs, Proceedings of the Workshop on Algorithms an d Data Structures, LNCS, Vol.. Ramesh: Algorithms for

• Summarize the methods used to reduce moral hazard in debt contracts.2. Basic Facts about Financial Structure Throughout