從一級結構預測DNA結合蛋白之標的序列

(1)

Predi

國立臺

Gr Coll

從一 icting ta

臺灣大學

raduate In lege of Ele

N

一級結構 arget seq

指導 Adv

學電機資碩

nstitute of ectrical En

National T Ma

構預測 D quences

prim

Ch

導教授：

isor: Ye

Chien-Y

中華民 Ju

資訊學院碩士論文

Computer ngineering

Taiwan U aster The

DNA 結合 of DNA mary stru

林志瑋 hih-Wei

歐陽彥陳倩瑜 en-Jen O

Yu Chen

民國 100 uly, 201

院資訊工文

r Science g and Com

Universit esis

合蛋白之 A-bindin ucture

瑋 Lin

彥正博瑜博士 Oyang, P

n, Ph.D.

年 7 月 11

工程學研

Engineeri mputer Sci

ty

之標的序 ng prote

博士士 Ph.D.

.

月

研究所

ing ience

序列

eins based on

(2)

兩年決定幫助跨領長、

還要一起小姐面的

年前憑著喜歡定加入了歐陽助。最重要的領域計畫的研翊鍾學長在要謝謝元鴻以起打拼是很棒姐，雖然在去的確成長了許

歡生物的一陽彥正老師的就是歐陽研究過程中在我摸索生以及又仁，

棒的事情，

去年的時候許多，感謝

一股傻勁，就師的實驗室陽彥正老師中，我學到了生物資訊的過雖然我們研非常開心可候離開了我謝你。

誌謝

就決定進入

。在念碩士

、陳倩瑜老了許多東西過程中不時研究的主題可以認識兩

，讓我低迷

入生物資訊士班的這兩老師以及張西。另外還時的給我許題不同，但兩位。最後迷了一陣子

這個領域的年之間，得天豪老師的要感謝廷因多指教，令是有兩位夥要感謝我的

。但是在這

的研究，因得到了許多的教誨，在因學長、鈺令我成長許夥伴在這兩的前女友湯這之後我在

因此就多人的在參與鈺峰學許多。

兩年中湯穎婷在各方

(3)

結合設計何進 (posi 種而樣的準確常關物結的序結構

這篇起來樣以所提處，

DNA 究之

合特定 DNA 計生物實驗尋進行，並解釋

ition freque 而言，截至目的模型。由於確預測位置頻關心的研究議結構與紀錄不序列的新方法構，接著利用

篇論文使用了來，這篇論文以序列資訊為提之方法表現本論文所提 A 複合物結之進行。

A 序列的蛋白尋找這些 D

釋基因組中 ncy matrix) 目前為止，

於生物實驗頻率矩陣，

議題之一。

不同胺基酸法。當我們用該樣板與

了兩組資料文提出的新為基礎且利現略差。由提之方法，

結構之蛋白質

中文

白質在基因 NA 結合蛋中序列的變

) 是最常被只有一小部驗往往需要高加速這個研這篇論文針酸和核酸之間們拿到一條蛋與所得之知識

料去評估新方新方法可以達利用已知位置由於現存這些仍可幫助一質序列預測

文摘要

因調控中扮演蛋白質的標的

變異如何擾被拿來描述這

部分的轉錄高資金與人研究領域的針對這個問

間結合偏好蛋白質序列識庫進行位

方法的表現達到和它們置頻率矩陣些以序列資一些相關的測其標的序列

演重要的角的序列可以擾亂正常的基

這些標的序錄因子已經人力成本，

的進展，一問題，提出一

好的知識庫去列，會先挑位置頻率矩陣

現，和其他們一樣的預陣訓練所得資訊為基礎的研究，針列，所得之

角色，利用計以幫助我們了基因表現。

序列的模型從相關生物因此，如何直以來是生一個利用蛋白

去預測 DNA 選一個適當陣的預測。

利用三級結測效果；但之預測模型的預測方法對其同源序之預測結果將

計算方法預了解基因調

。位置頻率

，對大部分物實驗中取何利用計算生物資訊學蛋白質 DNA

A 結合蛋白當的樣板複

結構的方法但若與另一型相比，本法仍各有其序列已有蛋將有助於相

預測或調控如率矩陣

分的物取得這算方法學家非複合白之標複合物

法比較一個同本論文其侷限蛋白質相關研

(4)

Prote expre how norm used only PFM desir progr the k nucle Whe prote incor The comp comp that i meth since summ predi facili

eins that bi ession. Iden genes are r mal gene exp

models to a small fra Ms. Since bi red to deve

ress. Here, knowledgeb eotides is pr n given a q eins is selec

rporated wit proposed putational m pared struct is trained by hod perform

e different mary, it is icted based itates relate

ind specific ntifying targ regulated in pression. Po

represent su action of the

ological ex elop compu

a new met base contai roposed to p query protein cted and th th the built method is methods. It

ture-based m y collected ms slightly w

predictors concluded d on its prim

ed studies i

AB

c DNA seq get sequenc n cells and e osition frequ

uch target s e transcripti periments u utational m thod based ning the pr predict quan n sequence, he PFM pre

knowledgeb evaluated turns out th methods. O experiment worse. Even

usually h that a DN mary structu n the future

BSTRA

quences pla ces of a DN explain how uency matri sequences. H

ion factors usually requ methods with on existing reference o ntitative spe , a protein-D ediction is

base.

by two d hat the prop On the other tally determ n though, th have their NA-binding

ure using th e because t

CT

ay importan NA-binding

w genetic va ices (PFMs) However, u (TFs) have uire much t h satisfied g protein-DN

of contacts ecificities o DNA compl made base

datasets and posed metho

hand, whe mined PFMs

he proposed own advan protein’s he complex target seque

nt roles in protein help ariations cau

) are one of up to now, f experimen time and co accuracies NA comple between a of protein-D lex structure

d on the se

d compare od can predi

n a sequenc s is compar

d method s ntages and binding pre xes of its ho ences of pr

regulating ps to under use disrupti f the most w

for most sp ntally determ

ost, it is stro to speedu ex structure amino acids DNA interac

re of homolo elected tem

ed with ex ict as well a ce-based m red, the prop still has its d limitation eference ca omologues.

roteins with gene rstand ion of widely ecies, mined ongly p the s and s and

tions.

ogues mplate

isting as the ethod posed value ns. In

an be This hout a

(5)

solveed structure could be prredicted noww.

(6)

口試誌謝中文 ABS CON Table Figur Chap Chap

Chap

試委員會審定謝 ...

文摘要 ...

STRACT ....

NTENTS ....

es ...

res ...

pter 1 In

pter 2 L

2.1 Algo 2.1.1 2.1.2 2.1.3 2.2 Algo

2.2.1 2.3 Algo

2.3.1

pter 3 M

3.1 Mat 3.1.1 3.1.2 3.1.3

定書 ...

...

ntroduction Literature R orithms for

Predicting Predicting Predicting orithms of s

BLAST ..

orithms of s TM-align Methods ...

erials ...

Collection Collection Relating P

CO

...

n ...

Review ...

predicting p g the bindin g PFM by h g PFM base sequence ali ...

structure ali n ...

...

...

n of protein n of PFMs . PFMs to pro

ONTEN

...

...

protein-DN ng preferenc homologues

ed on structu ignment ...

...

ignment ...

...

...

...

n-DNA com ...

otein-DNA

NTS

...

...

NA binding s ce of DNA- s’ annotated

ural model ...

...

...

...

mplex structu ...

complex st

...

...

specificities binding pro PFMs ...

and potentia ...

...

...

...

ures ...

...

ructures ...

...

...

s ...

oteins ...

...

al functions ...

...

...

...

... #

...i

... ii

... iii

... v

.... vii

... viii

... 1

... 4

... 4

... 5

s ... 6

... 10

... 11

... 13

... 13

... 14

(7)

Chap

Chap REFE

3.2 Buil 3.3 Pred 3.3.1 3.3.2 3.3.3

pter 4 R

4.1 Mea 4.2 Vali

4.2.1 4.2.2 4.3 Perf

4.3.1 4.3.2 4.4 Eval 4.5 Disc 4.5.1

4.5.2 4.5.3 4.5.4 4.5.5 4.5.6

pter 5 C

ERENCE ..

lding the kn diction fram Template Building Refining Results ...

asuring perf dation sets . Training d Protein-D formance ....

Training d Protein-D luating SAB cussion ...

Differenc structures The effec How to se Similar pr Using the The frequ Conclusions ...

nowledgeba mework ...

selection an the predicte the PFM by ...

formance ....

...

data of SAB DNA comple ...

data of SAB DNA comple BINE ...

...

ces betwee s and their a t of differen elect a temp rotein seque e number of uency of am s ...

...

se ...

...

and contact r ed PFM by y knowledg ...

...

BINE ...

exes with an ...

BINE ...

exes with an ...

...

en DNA s annotated PF

nt contact d plate ...

ences bind s f contact ato mino acids an

...

...

residue subs DNA seque ebase ...

...

...

nnotated PF ...

...

nnotated PF ...

...

sequences FMs ...

distance cut- ...

similar DNA oms of conta

nd nucleotid ...

...

stitution ...

ence in the t ...

...

...

FMs ...

...

FMs ...

...

in protein ...

-off ...

...

A sequence act residues des ...

...

...

template ....

...

...

...

n-DNA com ...

...

es ...

s ...

...

...

...

... 14

... 16

... 18

... 21

... 21

... 22

... 25

... 30

mplex ... 30

... 32

... 33

... 34

... 35

... 38

... 40

(8)

Table Table Table Table Table

e 4-1 PFM b e 4-2 Differ e 4-3 Simila e 4-4 Simila e 4-5 Avera

become mo rence before arities of 24 arities of 12 age similarit

ore similar w e and after P 4 proteins of 2 proteins w ty under diff

Tables

with annotat PFM refinem

f different a with similari fferent distan

tion after re ment by the algorithms ..

ty > 0.7 ...

nce cut-off .

efinement. . e knowledge ...

...

ebase ...

...

... 23

... 26

... 29

... 32

... 33

(9)

Figur Figur Figur Figur Figur Figur Figur Figur Figur

re 2-1 Predi re 2-2 Work re 3-1 Prefe re 3-2 Predi re 4-1 Simi re 4-2 Distr re 4-4 Corre re 4-5 Cont re 4-6 Score

iction frame kflow of the erence betw iction frame larities und ribution of s

elation betw tact counts b es between

ework of Sc e study of A ween amino ework of th er training d similarities b ween protein

between am amino acid

Figure

chroder, A., Alamanova, acids and n his study. ...

data set of S between DN n sequence mino acids an

ds and nucle

es

et al’s work et al. ...

nucleotides ..

...

SABINE...

NA sequenc similarities nd nucleotid eotides unde

k ...

...

ces and anno and PFM s des ...

er the older

...

otated PFM similarities.

...

scheme ...

... 6

... 10

... 16

... 17

... 24

Ms .. 31

... 34

... 36

... 37

(10)

Ch

Prote regul binds cause assig comp DNA progr prote bindi

Posit conse of tra nucle

Ther Surfa affin intera on th mole

hapter 1

eins that bi lation of ge s can help t e disruption gned to a c putational m A-binding s

ress in the ein-DNA in ing specific

tion freque ensus of tar anscription eotides A, C

re are exper ace plasmo

ity of a pro actions, but he fact that ecular attach

Intr

nd to speci ne expressi to understan n of normal certain mol methods eff specificities developme nteraction p city informa

ency matrix rget DNA s factors (TF C, G and T o

rimental me on resonanc otein-DNA i t it can also t the angle hed to the o

oductio

ific DNA s on. Identify nd how gen gene expre ecular func ficiently. Ho

) remains ent of high-

arameters, t ation is still

x (PFM) is sequences th Fs) [1-3]. PF occur at eac

ethods for ce (SPR) is interaction d o be used to

of light re other side of

on

equences a ying the targ ne regulatio ession withi ction (e.g.,

owever, qu insufficient -throughput the determi laborious.

s a simple hat can be r FMs indica ch position w

determining s one of th

directly. SP o measure p eflection fro

f the surfac

are extremel get sequenc

n proceeds n cells. In r transcriptio antitative fu t for the re t technologi

ination of h

probabilist recognized ate for a cer

within the b

g the bindi e methods PR is often u protein-DNA om a surfac e. DNA can

ly importan es of DNA- and how g recent years on factor) b functional in

equirement.

ies for the highly resol

tic model t by a DNA- rtain TF how binding site.

ng specific for measur used to stud A interaction

ce depends n be attache

nt for the p -binding pro genetic varia s, proteins c

by biologis nformation . Despite r measureme lved quanti

to represen -binding do w frequentl .

city of a pr ring the bin dy protein-l

ns. SPR is b on the ma ed to the sur

proper oteins ations can be sts or

(e.g., recent ent of

tative

nt the omain

ly the

otein.

nding igand based ass of rface,

(11)

and t the f meas (PBM array prote amou antib prote synth deter

Usin than meth seque and subse are c some appro are s conse appli for th

then protein formation of sured, and t Ms) are ano ys of over 4 ein is added unt of prote body to the eins still ne hesized in rmine the bi

ng computat using expe hods for PF ences to wh determine equences. S currently on e structure- oaches are shown to p erved and d ied the met he PFM sim

ns are added f protein-D then the bin ther techno 44,000 spots d into the ar eins that bin e protein. D eed to be

vitro. So inding spec

tional meth erimental m FM inferenc hich the tran the frequen Such metho nly availabl -based app based on a perform we do not allow thod suppor milarity of b

d to change NA comple nding affinit logy for me s which con rray, and is nd to a spe Despite the prepared fo it still spen ificity of pr

hods to mod methods doe ce of a tra nscription f ncy of each ods require

le for a sma proaches w analyses of ell in tellin w degenerat rt vector reg between pro

e the reflect ex and the o

ties can be m easuring bin ntain all pos then washe ecific DNA e significan for these m nds some rotein-DNA

del binding es. One of anscription

factor can b h position sufficient s all amount were also p

protein-DN ng which p tion [4-7].

gression (S oteins based

tion angles off-rate for

measured. P nding specif

ssible comb ed to remov

spot is det nt progress methods, eit

time using A interaction

specificity f the most factor is to bind, and th

among the sequences fo

of DNA-bi presented t NA complex

positions in On the othe SVR) to pre d on their pr

of the light its dissocia Protein bind ficity of pro binations of ve nonspeci termined wi of experim ther purifie g experimen ns.

can spend widely use o collect a en to condu e detected o for pattern d

inding prote to predict x structures n a PFM s er hand, Sch edict a quan

rimary struc

t. The on-ra ation can th ding microa oteins. PBM f DNA 10-m ific binding with a fluore mental met ed form cel ntal metho

d much less ed computat

set of prom uct motif fin

over-repres discovery, w teins. Previo

PFMs. Se s These me

should be chroder, A., ntitative me cture, and fu

ate for hus be arrays M uses mer. A g. The escent thods, lls or

ds to

s time tional moter nding sented which ously, everal ethods more et al.

easure urther

(12)

predi

How nucle to ha at pr from prote prote struc as a proba migh were that d PFM

In Ch prote meth meth inclu

ict the PFM

wever, in [8 eotides was ave an annot roviding an m primary s ein-DNA co ein based o ctures is pro

PFM that e ability equa ht bind to s e used to inf describes th Ms built by th

hapter 2, se eins and pre hod is introd hod is show uded in Chap

M of a protei

8] , the pr not consid tated PFM,

alternative structure, p omplex stru on its prim oposed. The

every colum als to one.

similar DNA fer the PFM he contact fr he DNA seq

everal metho evious studi

duced in Ch wn and som

pter 5.

n [8].

reference b ered. In add which is no to predict provided th ucture. In th mary structu e DNA sequ mn contains Based on t A bases, th M of a given requency be quence in a

ods propose ies of protei

hapter 3, an me discussio

between am dition, it req ot usually a

target DNA hat any hom his thesis,

ture and a uence in a p

only one o the idea tha he sequence n protein seq

etween ami protein-DN

ed for predi in-DNA inte nd in Chapt on is made.

mino acids quires homo available. In A sequences

mologue o a novel app

collection protein-DNA of the nucle

at similar c es in protei quence. Fur no acids an NA complex

icting target eractions ar ter 4, the pe

Finally, co

in protein ologues of t this regard s of a DNA f the query proach to p

of protein A complex eotides (A, contact resid in-DNA com rthermore, a nd bases is b x structure.

t sequence o re introduce

erformance onclusions o

ns and bas the query pr d, this thesis A-binding pr y protein h predict PFM n-DNA com

can be reg T, C, or G) idues of pro

mplex struc a knowledg built to refin

of DNA-bin ed. The prop of the prop of this thes

es in rotein s aims rotein has a M of a

mplex arded ) with oteins ctures ebase ne the

nding posed posed is are

(13)

Ch

In se intro speci solve speci exam comp

The seque while PFM sectio

2.1

2.1.

In th 168 h obtai differ

hapter 2

ection 2.1, duced. Giv ificity can b ed structure

ificity can b mple, protei

plex structu

technique o ence alignm e comparin M prediction

on 2.3.

Algor speci

1 Predic

he study of homeodom ined using p

rent mach

2 Lite

different m ven a protein

be inferred e (i.e. prot be inferred in binding ures.

of sequence ment is intro ng the propo

n, so an alg

rithms ificities

cting the b

Berger, et a ains agains protein bin hine learnin

rature

methods for n with a so by structur tein with o

by protein- microarray

e alignmen oduced in se

osed metho gorithm of

for p

binding pre

al. [9] the Z t all the pro ding microa ng algorith

Review

predicting olved protei ure-based PF

only primar -DNA bindi y (PBM) da

nt was used ection 2.2. I od in this th f structure a

predictin

eference o

Z-score tran obabilities o arrays (PBM hms to pr

w

protein-DN in-DNA com FM predict ry structure ing informa ata, annotat

d in this the In addition, hesis with t alignment,

ng prot

of DNA-bi

nsformed re of 32,896 8 Ms). After redict the

NA binding mplex struc

ions. For a e informati ation of its ted PFMs,

esis, thus th structure a the existing TM-align

tein-DNA

inding pro

elative sign 8-mer DNA

that, Alleyn Z-scores b

specificitie cture, its bin a protein wi

ion), its bin homologue or protein-

he basic id alignment is g structure-b

is introduc

A bind

oteins

nal intensitie A sequences ne, et al. ap between m

es are nding ithout nding es, for -DNA

dea of s used based ed in

ding

es for were pplied mouse

(14)

homo In th into conv binar 20 o binar vecto fores funct Z-sco algor revea bindi

2.1.2

Schro predi struc featu The p i)

ii)

oedomains he study of feature set verted into n

ry vectors o rthogonal v ry vectors or of 20 zer st regression

tion kernel ore. It turn rithms. The als the fact ing preferen

2 Predic

oder, et al.

ict PFMs of ctural and p ures to train

prediction f Feeding trained S (the SVR

and indivisu [10], the c s for the tr numerical e of length l vectors and

correspond ros. With de n, support l, and prin ns out that e root mea t that, the nce of prote

cting PFM

. applied th f proteins ba physicochem n the SVR m

frame work the sequen SVR models R model) re

ual DNA 8- ontact regio raining of p encodings r

×20 digits, an amino ing to resid erived featur

vector regr ncipal comp t nearest n an square e

amino acid eins.

M by homo

he regressi ased on thei mical prope models in or

is described nce, organis

s;

porting the

-mers [10].

on of these prediction m representing

i.e. the 20 acid seque dues at eac res, classifi ression with ponents reg neighbor p error (RMS ds within c

ologues’ an

ion method

ir primary s erties, and rder to pred d as followe sm, and stru

PFMs with

proteins w models. Th g amino aci different am nce was re ch position ers includin h linear, po gression we erformance SE) of near ontact regio

nnotated P

d support v structure [8]

phylogenet dict the PFM

ed (Figure 2 uctural supe

h similarity h

were aligned he sequence id sequence mino acids w

presented b . Gaps wer ng nearest n olynomial,

ere applied best com rest neighbo

ons do hav

PFMs

vector regre

]. Pairwise a tic distance M similarity

2-1):

erclass of th

higher than

d and transf e alignment es of length were encod by concaten re encoded neighbor, ran

and radial d to predic mparing to or is 0.76.

ve influence

ession (SVR alignment s es, were us y of two pro

he protein t

n a threshold ferred

t was h l as ded as nating d as a

ndom basis ct the

other This es on

R) to scores, ed as oteins.

to the

d;

(15)

iii) iv)

The a scale rando signi

2.1.3

Moro PFM

Filtering Merging

Fig

average abs e from 0 to omly sampl ificantly hig

3 Predic

ozov, et al.

Ms [5, 6]. T

g outlier PFM g the remain

gure 2-1 Pre

solute error o 2. In com led is 0.64, gher than th

cting PFM

developed This model

Ms;

ning PFMs t

ediction fram

(AAE) of t mparison, th indicating e performan

M based on

a simple n only cares

to get the fin

amework of

this framew he average s

that the pr nce expecte

n structura

null model s about the

nal predictio

Schroder, A

work with de similarity b redicted per ed by random

l model an

called ‘con e number o

on;

A., et al’s wo

efault param between two rformance o

m guesses.

nd potenti

ntact’ mode of atomic c

ork

meter is 0.12 o PFMs tha of SVR mo

ial functio

el for predi contacts bet

2 on a at are del is

ns

icting tween

(16)

prote colum

, wh prote

d param phys and h Diffe the a amin know Amo of ea funct To c dista wher of th dista prote comb

ein side cha mn is cal

here is th ein and DN denotes the meter. This ics-based p has the adva erent to the all-atom kn no acid typ wledge-base ong the serie asy implem

tions, cFIRE onstruct the ance falling

re = 3, 4, he values of ance falling ein-DNA co binations, th

ains and DN lculated as f

1 4 1 1

4 1 3

he observed NA respectiv

base type o model has potential fun antage of co contact mo nowledge-ba pes into con ed potential

es of all-ato mentation an E and vcFIR e knowledg g within a , 5, 6, 7, 8, f . In this

within a sp omplex stru

he potential

NA bases. Th follows:

⁄

d number o vely is defin observed in been shown nction when

onsuming m odel, which ased potent nsideration.

function th om scoring f nd is shown RE, in predi gebase, the

specified r 9, and 10 (Å

study, the pecified ran ctures colle between at

he probabil

of atomic c ned to be in n the positio

n to be gene n a native pr much less co h considers tial function . The FIRE hat consider functions pr n to be gen icting PFMs

number of range Å), and ∆

number of nge,

ected from P tom types i

ity of the ba

, 0 , 1

contacts (a n contact if on i in the c

erally as go rotein-DNA omputation

only the co n presented E potential

rs interactio resented in nerally as g

s.

f pairs of a

∆ , wer is set as 3 pairs of ato , , , are c PDB [11]. W

and j is repr

ase pair typ

heavy atom they are cl omplex, od as the st A complex i

cost.

oordinates o d by Zhou, function [ ons between

[7], FIRE h ood as two

atom types re denoted

for = 3 a om types calculated b

With resented as

pe in the

m pair from loser than 4

and is a tatic model

s considere

of residue a et al. take 7] is a suc n all atom t has the adva o of its exte

i and j wit

as ,

and 1 for th and wit based on the

, , of a follows:

PFM

m the 4.5Å),

a free using d [5],

atoms, es the

ccinct types.

antage ended

th the , , , he rest th the e 990 all the

(17)

wher 10Å dista given of th

Assu and t

wher comb PFM

wher Alam

wher atom

re , , . The va ance-depend n complex, e observed

ume that inf thus G can

re ∆ is t bining two Ms as follow

re is a fr manova, et a

re is the m types.

, alue of dent number

the binding atom pairs

fluences on n be represe

the binding equations a s:

ree paramet al. used ano

set of atom and

, ,

, ⁄∑ is set as r of ideal-g g free energ

[5]:

∆

binding fre ented as foll

∆

g free energ above, we

∑ _∈ ter.

ther potenti

ln |

mic distance are the nu

ln 0

, , , 1.61 becau gas points in gy, ∆ , is d

,

ee energy fro lows:

∆ ∆

gy of a bas can estimat

∆

∈ , , ,

ial function

s betw umber of a

, , , ,

use it best n finite pro defined as t

, ,

om differen

e  (A, T, te the prob

∆

proposed in

ln |

ween interfac atoms in th

∆ ∑⁄ fits of otein-size sp

he sum of a

nt positions

C, or G) a abilities in

n [12] to pre

, ,

ce atoms, he protein

∆ , to the a pheres [7].

all the pote

are indepen

at position each colum

edict PFM

and ar and DNA.

actual For a entials

ndent,

i. By mn of

[4].

re the . The

(18)

proba

wher proba expre

Whe separ For seque calcu estim

, wh matri DNA linea colum

The w

ability of at

re |

ability, esses the lik

re

rated by dis each DNA ences were ulated by th mated by sol

ere is a ix of dimen A sequence.

ar equation i mn of PFM

workflow o

tomic conta

, , is is the B kelihood of

, ,

, , is stance . A sequence e generated,

he formula lving the lin

vector of 4 nsions (4N,

. In additio is solved by s were estim

of this study

acts was mo

| , ,

the likeli Bayesian pr observation

s the numb

of length N , and their as above. T near equatio

4N dimensi 4N+X), wi on, is a y least squa mated as fol

∑ _∈ y is describe

deled as the

ihood func rior and wa n a native-li

, ,

mber of co

N in the p potentials w The weight on:

ions of the ith each row a vector con ares optimiz

llows:

∆

∈ , , ,

ed as Figure

e likelihood , , |

, , ction,

as set to o ike interatom

∑

ntacts obse

protein-DNA when boun s of each

estimated w of the ma nsists of 4N zation. Final

∆

∆ e 2-2

d:

, , is ne. The fo mic distance

, , , , erved betw

A complex, nd by the q position in

weights, an trix A corre N+X poten lly, the prob

s the mar ollowing for

e:

ween an

, 4N+X ran query protei n the PFM

nd A is a b esponding t ntial values babilities in

rginal rmula

nd

ndom in are

were

binary o one . The n each

(19)

2.2

Sequ BLA uses wher seque hand time

2.2.

BLA

Algor

uence alignm AST are the

dynamic p re n and m

ences. In th d, BLAST u than Smith

1 BLAS

AST (Basic

Figure 2-2

rithms o

ment is a fu most famou programmin m are the l his regard, i uses a heuri h-Waterman

ST

Local Align

2 Workflow

of sequen

undamental us algorithm

g to solve t length of th

t is not use istic approa while align

nment Sear

of the study

nce align

problem o ms for local the problem he sequenc ful while se ach to searc ning long se

rch Tool) is

y of Alaman

nment

f bioinform l sequence a m, and its ti ces. So it t

earching a h ch the solut equences.

s a local seq

nova, et al.

matics. Smith alignment. S ime comple takes lots o huge databa tion, and it

quence align

th-Waterman Smith-Wate exity is O(n of time to ase. On the takes much

nment algo n and erman nm), align other h less

orithm

(20)

for b align and f BLA to its solut being The m



2.3

Prote rangi prote struc DAL

2.3.

TM-a

biological s nment [13].

further find AST is one o

s high spee tion as Smi g widely us major steps Making a sequences;

Constructin Matching q Scanning th Extending

Algor

ein structur ing from pr ein function cture alignm LI [16] and S

1 TM-al

align only

sequences, We can al its similar s of the most ed for searc

ith-Waterma ed.

s of BLAST list of nei

ng a diction query with s he database

the match in

rithms o

ral compari rotein fold n annotatio ment [14]. I SAL [17].

lign

employs t

and we can ign a query sequences i

commonly ching a solu

an does, its

T are listed a ighborhood

nary for all w score higher e for words;

n both direc

of structu

sons are em classificatio on. TM-alig It is ~4 tim

the informa

n use it fo y sequence in the databa y used progr

ution. Altho s high spee

as followed:

d words, fo

words in the r than a thre

ctions when

ure align

mployed in on, protein gn is one mes faster th

ation of ba

r protein se with a sequ ase.

rams in the ough it doe ed still mak

:

or example,

e query;

eshold;

n a match is

nment

n many bra structure m of the mo han CE [15

ackbone

equence or uence datab

field of bio es not guara kes it very

, length 4

found;

nches of st modeling to ost famous 5] and 20 t

coordinate

r DNA sequ base by BL

oinformatic antee an op useful, and

for amino

tructure bio o structure-b s algorithm times faster

es of the uence LAST,

cs due ptimal d thus

acid

ology, based ms for

r than

given

(21)

prote 1.

2.

ein structure Initial struc There are alignment proteins us on the gap against the programmi initial align Heuristic it In this proc initial align

where residue of 1.24 alignment S , . The a new sco converges.

e. The algor cture alignm

three initi is obtained ing dynami pless matchi e larger p ing, but the nment.

teration cedure, stru nment [18].

is the dista the second

15 1.

can be obta en a new TM

re matrix i Finally, the

rithm execu ment

ial alignme d by assign ic programm

ing of two protein. The e scoring m

uctures are r The score s

S ,

ance betwee d structure .8, i ained by im M-score rota is obtained e alignment

utes the follo

ents being ning second ming. The se

proteins. T e third in matrix is a co

rotated by T similarity is

1 ⁄

en the th r under the s the leng mplementing

ation matrix . The proc

with the hi

owing steps

exploited.

dary structu econd type The smaller itial alignm ombination

TM-score r s defined as

1

⁄

esidue of th TM-score gth of the

g dynamic p x is obtained edure is re ghest TM-s :

The first ure to each

of initial ali protein is g ment is al

of the first

otation mat :

he first struc superpositi

smaller st programmin d by the new epeated unti score is retu

type of i h residue of ignment is b gapless thre lso by dyn t and the se

trix based o

cture and th ion.

tructure. A ng on the m w alignmen til the align urned.

initial f two based eaded namic econd

on the

he th

new matrix t, and nment

(22)

Ch

Altho espec allow struc struc In th these

As in speci PBM prote by us

3.1

3.1.

Prote colle given prote seque criter

hapter 3

ough the st cially in tel w degenera cture-based cture solved his regard, t e proteins.

ntroduced i ificities bas M or PFM.

ein-DNA bin sing protein

Mate

1 Collec

ein-DNA co ected [11]. P n query pro ein-DNA co

ences of th ria: a) it i

3 Met

tructure-bas lling which ation, it is PFM pred d. However, this thesis a

in Chapter sed on thei However, nding infor n-DNA com

erials

ction of pr

omplex stru Protein-DNA otein sequen

omplex str he query. Th

s an X-ray

hods

sed PFM pr h positions s strongly diction can

there are s aims to dev

2, there are ir homolog

protein-D mation. Her mplex structu

rotein-DN

uctures fro A complexe nces. Since ructure of he template y structure

redictors ha in a PFM s

desired to be applied still many p velop a me

e algorithm gues’ protei DNA compl re a novel m ures of its h

NA comple

om the 27 es were col similar pro the query’

e structures with resol

ave been sh should be m o design a d only whe proteins do n

thod to pre

ms for predi in-DNA bin lex structur method for homologues

ex structur

February 2 llected for f oteins binds s homolog were requi lution bette

own to per more conse a new alg en the quer not have a edict the tar

cting prote nding inform res provide

inferring PF is proposed

es

2009 releas finding a pr s similar DN

ue helps p red to satis er than 3.0Å

rform well erved and d gorithm be ry protein h

solved stru rget sequen

ein-DNA bin rmation, suc e some kin FMs of a pr d.

se of PDB roper templa NA sequenc predicting t sfy the follo Å, b) the

[5-7], do not

cause has a cture.

nce of

nding ch as nd of rotein

were ate of ces, a target owing DNA

(23)

mole chain prote abov

3.1.2

Anno anno of hu

3.1.3

After struc comp prote supp After anno

3.2

Befo hiera repor simil

ecule has  n has  5 c ein chain h ve.

2 Collec

otated PFM otated PFMs uman, 802 o

3 Relatin

r collecting ctures the c plex structu ein have the

osed to bin r mapping P otated PFM

Build

ore building archical clus rted by BL larity to the

6 paired b ontact resid has 40 res

ction of PF

Ms of huma

s of yeast w of mouse, an

ng PFMs

g the data collected P ures by the e same entry

nd to the D PFMs to the are remaine

ding the

g the know stering algo LAST [13].

e other prote

ases and ha dues (residu sidues. The

FMs

an, and mo were collecte

nd 117 of ye

to protein

above, the PFMs are a eir entry na

y name, they DNA sequen

e protein-DN ed, correspo

knowled

wledgebase, orithm Hom

In each cl ein chains is

as less than ues within 4 ere are 990

ouse were c ed from MY

east were co

n-DNA com

next step associated.

ames in Un y are associ nces similar NA complex onding to 26

dgebase

we cluster moClust [22

luster, the p s assigned a

n 30% non- 4.5Å to the

protein ch

collected fr YBS [20]. T

ollected.

mplex stru

is to know PFMs wer niProt datab iated with e

r with thos x structures 6 different p

red the col ] based on protein chai as the repre

paired base DNA mole hains satisfy

om TRAN Totally, 592

uctures

w with whic

re related base [21]. I ach other, i se of the sa s, 119 protei proteins.

lected prot pair-wise s in with the sentative of

es, c) the pr ecule) and d fying the cr

SFAC [19]

annotated P

ch protein- to protein- If a PFM a i.e. this prot ame entry n

in chains w

tein chains sequence ide e largest av

f the cluster rotein d) the riteria

, and PFMs

-DNA -DNA and a tein is name.

ith an

by a entity verage r. The

(24)

seque In th The 260 t types for a discr are c nucle

,whe obser betw

wher and 0.25 and n The that Lysin On th

ences of co e end, 260 r contact resi templates w s of amino a amino acid riminate dif considered.

eotides are c

ere mean rved contac ween and

re f is t f is the in this stud nucleotides.

scores betw nucleotides ne) and am he other han

ollected 990 representati idues of pro were defined

acids and fo ds, only the fferent amin After cou calculated b

ns a type of ct counts o d . The exp

the frequenc e frequency

dy and i .

ween amino s prefer am ino acids w nd, it is inte

protein-DN ives were us otein chains

d first. With our types of

e atoms of no acids. Si unting the by the follow

S ,

f amino acid of and

pected conta , cy that amin y of the app is the obser

o acids and mino acids with polar (A

eresting that

NA complex sed to build s (residues w

h this inform f nucleotide f the side imilarly, for contacts th wing equati

log

d , mean , and , act number

f ∙ f no acid a pearance of rved total c

d nucleotide with positi Asparagine t Serine pref

xes were us d the knowle with 4.5Å to mation, con es were repo

chains are r nucleotide he scores ion.

, ,

ns a type o is the e is calculate

∙ appears in t f the nucleo

ontact coun

es are show ive charges and Glutam fers Thymin

sed as input edgebase.

o any nucle ntact counts

orted respec considered es, only the

between am

f nucleotide expected co ed by

he Swiss-Pr otide , w nts between

wn as Figure s (Arginine mine) to oth ne to other n

t for HomoC

eotide prese s between tw

ctively. Not d, because e atoms of

mino acids

e. , ontact frequ

rot database which was s n all amino

e 3-1. It ap e, Histidine her amino a nucleotides

Clust.

ent) in wenty e that they bases s and

is the uency

e [21], set to acids

ppears , and acids.

.

(25)

3.3

Whe follo i) ii) iii) . The p

‐3

‐2.5

‐2

‐1.5

‐1

‐0.5 0 0.5 1 1.5 2 2.5

F

Predi

n a query p wing three

Template Building Refining

prediction f

igure 3-1 Pr

iction fr

protein sequ steps.

e selection a g a predicted g the PFM b

framework o

reference be

amewor

uence is giv

and contact d PFM by D by the know

of this study

etween ami

rk

ven, the PFM

t residue sub DNA sequen wledgebase

y is shown a

ino acids an

M predictio

bstitution nce in the te

as Figure 3-

nd nucleotid

on will be p

emplate

-2

des

performed bby the

A T C G

(26)

Figur datab Gene know PFM

re 3-2 Pred base by B erating a pr wledge base M refinement

diction fram LAST and redicted PFM

e and the d t by DNA s

mework of th d find out M from the distance bet

equence in

his study. (A the contac e DNA sequ tween nucl the templat

A) Selectin ct residues uence. (C) C

leotides and te structure.

ng a templat that are Calculating d the conta

te from tem substituted

the PFM b act residues

mplate d. (B)

by the s. (D)

(27)

3.3.

Give chain selec BLA nucle repla will perfo

3.3.2

Since temp selec be ob other

3.3.3

Befo PFM struc struc matri betw

1 Templ

en a query p

ns in temp cted as tem AST, there w

eotide) of th aced by. No

be perform ormed.

2 Buildi

e similar pr plate structu cted, a PFM btained. If rwise,

3 Refini

ore refining M is built b

cture, we ca cture is , th

ix M is ca ween amino

late select

protein sequ

late databa mplate of th will be mism

he template ote that not f med. If there

ing the pre

rotein seque ure gives us M with proba

the position

, 0

ing the PF

the PFM b by the know

an calculate hen a PFM alculated by acids and n

ion and co

uence, a pr ase using B

e query. A matches on

protein. Th for all the g e’s no temp

edicted PF

ences bind s s important

ability of ei n of the 0

FM by kno

built by the wledgebase a column o of length y the conta

ucleotides.

ontact resi

roper homol BLAST. Th According to

the contact he contact re given query

plate with e

FM by DN

similar DNA

information ither one or

DNA sequ

owledgeba

DNA sequ

first. For of the PFM is constru act residues

idue subst

logue is sel he protein w

o the seque t residues ( esidues of th

protein seq e-value < 0

NA sequen

A sequence n to infer P r zero based ence is bas

ase

ence in the each DNA M. If the DN ucted. To co s of each b

titution

lected from with the lo ence alignm residues wi he template quences, the .001, no PF

nce in the t

s, the DNA PFMs. After d on the DN se , then

template st A base pair NA sequence onstruct the base pair an

m the 990 pr owest e-val ment reporte

ithin 4.5Å t e protein are e PFM predi FM predicti

template

A sequence i

r a template NA sequenc

,

tructure, an in the tem e in the tem e PFM, a sc

nd the dist rotein lue is ed by to the e then iction ion is

in the e was ce can 1,

nother mplate mplate coring ances

(28)

M , the s base, conta an in defin temp resid After

wher know

For know toget by th done

, i conta

M

is the sc score betwe

, i.e. , act residues nverse ratio ned as the sh plate. Here dues closer t r calculating

re is a fr wledgebase.

refining th wledgebase, ther. If the c he DNA seq e by the follo

, is the ratio act residues

M , ∑

core of nuc een contact

. For each s Γ of this o of the dis hortest atom

is a fre to the nucle g M, we can

,

ree paramet .

he PFM bu , the two P contact resi quence in th

owing equa

, of the num s of the p

∈ ,

∑ _∈ leotide a

residue h position

position. T tance t m pair distan ee paramete

otide are fa n obtain a P

∙

∑ _{∈ , , ,} er. Now we

uilt by the PFMs built dues of a ba he template ation.

∙ 1 mber of subs

osition a

∙

at position and the nu

in the te The weight to the DNA nce between er that can avored as PFM by

,

∙ , e have a sec

e DNA se t ( a ase pair are structure sh

stituted con

and 0

1~ , ∈

and it equ ucleotide emplate stru

of each co A molecule n amino aci be adjuste gets large

1~ ,

cond PFM,

quence by and ) e conserved, hould be kep

, ∙ ntact residu

if there is

∈ , , ,

uals to the w defined by ucture, we ntact residu in the temp id and DNA d by the u r.

∈ , ,

which is co

the PFM ) are going , then the in pt, so the re

1~ , es over the

no contac

weighted su y the know

have the s ue is giv plate, and A molecule i user. The co

,

onstructed b

built with g to be m nformation efinement c

∈ , , , total numb ct residue a

um of wledge set of ven in

is in the ontact

by the

h the merged

given can be

ber of at the

(29)

posit

For t resid these 0.25.

provi tion.

the position dues, they ar e positions, . These pos ide any use

ns that are c re regarded

four types sitions are t ful informa

close to the d as unimpo of nucleotid trimmed bef ation.

e starting or ortant positio

des would b fore reporti

r ending po ons in this p be assigned ing the pred

ositions and protein-DN with the sa diction beca

d without co NA interactio ame probab ause they d

ontact on. In bility : do not

(30)

Ch

In th Intro In se other

4.1

To c was e calcu

wher are th

4.2

4.2.

In [8 differ SAB

hapter 4

his chapter oduction of ection 4.3 a r methods is

Meas

ompare the employed, r ulate the sim

re is the he probabili

Valid

1 Traini

8], 1239 pr rent databa BINE. These

4 Resu

r, a measu validation s and 4.4, the

s introduced

suring pe

e predicted referring to milarity betw

1

number of ities of nucl

dation se

ng data of

rotein seque ases. 453 o e protein seq

ults

ure to com sets that us e performan d. Finally, d

erforma

PFM and t the similar ween two PF

1 1

√2

f positions th leotides

ts

f SABINE

ences with of them we quences wer

mpare PFMs ed to test p nce of the p discussions a

ance

the annotate rity function

FMs, say

∈ , , ,

hat two PFM at position

E

their annot ere public re used to e

s is first i performance proposed m

are made in

ed PFM, a n used in [2 and , th

, ,

Ms aligned in PFM

tated PFMs and attach evaluate the

introduced e is followe ethod and c n section 4.5

measure ca 23] for comp

ey are align

together, an and

s with were hed in the proposed m

in section ed in section

comparison 5.

alled ‘Simil paring PFM ned to maxim

nd _, and , respective

e collected source cod method.

n 4.1.

n 4.2.

n with

larity’

Ms. To mize

d _, ely.

from de of

(31)

4.2.2

As m with prote for p BLA prote evalu

4.3

Two 4.3.1 in th requi propo comp as we

4.3.

453 p the p with PFM simil

2 Protein

mentioned i

known PF ein sequenc prediction. W AST with oth

eins cannot uation.

Perfo

validation 1, we can se he template

ire a prote osed meth plexes with ell as struct

1 Traini

protein sequ proposed m

e-value <

Ms built by e larity of 0.6

n-DNA co

n section 3 FMs, and th ces of the pr When evalu her 989 tem find a temp

ormance

sets descri ee the impro

structures ein-DNA co

od is com h annotated

ture-based m

ng data of

uences cont method. 96 o

0.001, so 3 employing 632, after r

omplexes

3.1.3, we ha

hey were a rotein-DNA uating the p mplate seque plate with e

e

ibed in 4.2 ovement aft by the trai omplex str mpared with

PFMs, and methods pre

f SABINE

tained in the of these pr 357 protein

the DNA se efining

with anno

ave 26 diffe also used to A complex s proposed me ences and fi

e-value < 0

were used ter the refin ining data ructure to h structure

it turns out edict.

E

e training d rotein seque sequences equence in by the

otated PFM

ferent protei

o evaluate structures w ethod, the q

nds a templ .001, so we

d to evaluat ning the PFM

of SABINE perform pr e-based me t that the pr

data of SAB ences canno

were tested the templa knowledge

Ms

in-DNA com the propos were used a query protei

late. Howev e finally use

te the propo Ms built by E. Structure rediction, s ethods usin roposed me

INE were u ot find a te

d. As show ate ( ) ebase, an av

mplex struc sed method as an input q

in sequence ver, there ar e 24 protein

osed metho DNA sequ e-based me so in 4.3.2 ng protein- ethod can pr

used as quer emplate stru wn in Figure have an av verage simi

ctures . The query e runs re two ns for

od. In ences ethods 2, the

-DNA redict

ry for ucture e 4-1, verage

ilarity

(32)

of 0 infer In Ta refin proba proba anno of an becau temp resul impr infor

.682 is ach rring the PF

able 4-1, it nement. For ability = ability to C otated PFM.

nnotated PF use there is plate structu

lt. It can roving the p rmation give

hieved. Th M of a quer

is shown t r example, 1 to Thym Cytosine, s . Similar sit FM. On the s no any co ure, so pred

be observe performanc en by the D

his shows t ry protein se that PFM b

, at the 1 mine, after

o the PFM tuations can e other han ontact resid dictions at th

ed that trim e, because DNA sequen

that refinem equence.

becomes mo 11^th positio refinemen M after refin

n be observ nd, PFM af due at 1^st~

hese positio mming the

trimming t nce in the tem

ment by th

ore similar on of anno nt, the kno

nement bec ed at 7^th, 8^t fter refinem

4^th positio ons were tri ese unimpo these positi mplate struc

he knowled

to the anno otated PFM owledgebase

comes more

th, 9^th, 10^th a ment is shor n of DNA immed befo ortant posit

ions filters cture.

dgebase do

otated PFM M,

e divides re similar t and 13^th po rter than

sequence i ore reportin tions also the unimpo

help

after gives some to the

sition , in the ng the helps ortant

(33)

F

Figure 4-1 SSimilarities under trainning data sett of SABINEE

(34)

An

DN o

re k

4.3.2

Next For t an av by kn know obser PFM after

Table 4

nnotated PFM

NA sequenc of template

PFM after efinement by

knowledge

2 Protein

t, 24 protein the PFMs b verage simi nowledgeba wledgebase

rved that m Ms built by e

refinement

4-2 PFM be

M

ce

y

n-DNA co

n sequences

uilt by emp ilarity of 0.6

ase, an aver we built do most of all th employing t t)

ecome more B

omplexes

in protein-D ploying the D 642 over th

rage simila o help refini he cases hav

the DNA se

e similar wit Binding Pro

with anno

DNA comp DNA seque he 24 protei arity of 0.71

ing the pred ve improve equences of

th annotatio ofile

otated PFM

plex structur

ence in the t ns. After in 11 is achiev diction of P ement after r

f templates,

on after refin

Ms

res were use template ( ncorporating ved. It show PFM. In Tab refinement.

, and

nement Similari

0.498

0.796

ed for valid ), they g the PFMs ws again tha

ble 4-2, it c . ( de denotes P

ity

ation.

y have s built at the can be enotes PFMs

(35)

PDB

1MD 1GT 1MN

3CO 2NN 3BP 1HL 1A0 1NK 2UZ 3G7 2PI 6PA 1YS 1B7 1MH

1TS 1D6 1ZM 1KB 1MN 1AK 2HA 1CD

Aver

Table 4-3

B ID

DMA TWA NNA O7C NYA PYA LOA

AA KPA ZKA 73A 0A AXA

AC 72B HDA

RB 66A MEC

B2A NMA KHA

APC WA rage

Difference

0.498 0.618 0.650 0.706 0.507 0.723 0.662 0.876 0.879 0.671 0.473 0.678 0.440 0.405 0.600 0.509 0.850 0.596 0.454 0.544 0.751 0.962 0.657 0.716 0.642

before and after PFM r

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

refinement b

0.535 0.660 0.689 0.904 0.639 0.853 0.751 0.862 0.957 0.835 0.556 0.659 0.547 0.375 0.669 0.695 0.960 0.994 0.611 0.587 0.433 0.962 0.622 0.716 0.716

by the know D

wledgebase Difference

0.037 0.042 0.039 0.198 0.132 0.130 0.089 -0.014 0.078 0.164 0.083 -0.019 0.107 -0.03 0.069 0.186 0.110 0.398 0.157 0.043 -0.318

0 -0.035

0 0.074

(36)

The

‘all-a make struc

The by T trans to ge predi propo all-at struc predi mode

In T all-at deno predi

Com meth (prot the p

contact mo atom model e comparis ctures of the

superimpos TM-align [1 sformed coo enerate a su

iction. The osed metho tom model ctures of the

iction. As a el has an av

Table 4-4, s tom model ote using na iction, respe

mpared with hod achieve tein structur proposed m

del propose l’ in the foll son with th ese 24 prote

sed structure 4]. The ori ordinates of

uperimpose templates od. As a re

has an av ese 24 prot a result, con verage simil

similarities are the alg ative structu

ectively.

h structure-b s a better p res without method with

ed in [5, 6]

owing cont he propose ins were us

es were con iginal prote f the query s ed protein-D

selected he esult, contac

verage simi teins were ntact model larity of 0.6

of differen gorithms de

ure and sup

based meth erformance DNA), the h the prote

and the all text) propos d method.

sed for PFM

nstructed by ein chains i

structure wa DNA comp ere are the ct model ha ilarity of 0

also used a l has an ave

89.

nt algorithm escribed in t

perimposed

hods with s e. For users ey could hav

ein sequenc

-atom know sed in [7] w

Superimpo M prediction

y applying t in the temp

as appended lex structur same as the as an avera 0.679. On t as input of erage simila

ms are inc this section structure a

superimpose have unbou ve better pr

ce of the

wledgebase ere implem osed structu

respectivel

the rotation late were r d into the te re for struc e templates age similari the other h

the two m arity of 0.7

luded. Con n. Native an as query for

ed structure und query p redicted PFM

unbound st

potential (c mented in ord tures and n

ly.

n matrix rep removed an emplate stru cture-based s selected b ity of 0.692 hand, the n methods for 16, and all-

ntact mode nd superimp

r structure-b

es, the prop protein struc Ms if they tructure. A

called der to native

ported nd the ucture PFM by the

2 and native

PFM -atom

l and posed based

posed ctures apply As for

(37)

struc same accom struc

cture-based e (or better,

mplishment cture-based

method wit compared t that the method.

th native st with all-ato

proposed,

tructures, se om model) , sequence

equence-bas performanc -based me

sed method ce as it does thod can

d can achiev s. This is a do as we

ve the great ell as

從一級結構預測DNA結合蛋白之標的序列

Predi

國立臺

Gr Coll

從一 icting ta

臺灣大學

raduate In lege of Ele

N

一級結構 arget seq

指導 Adv

學電機資 碩

nstitute of ectrical En

National T Ma

構預測 D quences

prim

Ch

導教授：

isor: Ye

Chien-Y

中華民 Ju

資訊學院 碩士論文

Computer ngineering

Taiwan U aster The

DNA 結合 of DNA mary stru

林志瑋 hih-Wei

歐陽彥 陳倩瑜 en-Jen O

Yu Chen

民國 100 uly, 201

院資訊工 文

r Science g and Com

Universit esis

合蛋白之 A-bindin ucture

瑋 Lin

彥正 博 瑜 博士 Oyang, P

n, Ph.D.

年 7 月 11

工程學研

Engineeri mputer Sci

ty

之標的序 ng prote

博士 士 Ph.D.

.

月

研究所

ing ience

序列

eins based on

誌謝

中文

文摘要

AB

BSTRA

CT

CO

ONTEN

NTS

Tables

Figure

es

Ch

hapter 1

Intr

oductio

on

Ch

2.1

2.1.

hapter 2

Algor speci

1 Predic

2 Lite

rithms ificities

cting the b

rature

for p

binding pre

Review

predictin

eference o

w

學電機資碩

資訊學院碩士論文

歐陽彥陳倩瑜 en-Jen O

院資訊工文

彥正博瑜博士 Oyang, P

博士士 Ph.D.