nucleotide cribes the p - 從一級結構預測DNA結合蛋白之標的序列

re is rede

ber of conta

ough this is performance

larity is still g the metho act residues ence is usu ide significa

6 The fr

le building s of protein ulate the ex

ined from S e frequenci ein-DNA co

eotide amou

escribed in

the numbe equence in in the formu

, efined as th act atoms of

s a nice idea e, either us l 0.71 unde ods describe

s of the te ally kept. T ant improve

requency o

the knowl ns and bas xpected con Swiss-Prot, a

ies were o omplex stru unts to its fr

section 3.3.

er of contac the templat ula describe

∙ 1 he number o

f position

a, however t sing them r

r the valida ed above. T emplate we Thus slightly

ement.

of amino a

ledgebase w ses of nucl ntact frequen

and the freq btained by uctures at f requency ob

.2.

ct atoms of te and PFM ed in section

of contact at .

the two me respectively ation set of This might ere not sub y adjusting

acids and n

which desc leotides, th ncy. Finally quencies of y the conta first. The f btained by t

f contact res M built by th

n 3.3.3 is re , ∙ toms which

thods descr y or using t protein-DN be caused b bstituted, so how to use

nucleotide

cribes the p e frequency y the freque

nucleotides act counts frequency o the contact

sidues to m he knowled

defined.

1~ , is substitut

ribed above them togeth NA complex

by the fact o the inform e the knowl

es

preferences y of them encies of am s were set to

in the 260 of a type o

counts. The

merging the dgebase tog

∈ , , , ted over the

did not imp her. The av x structures that most o mation of ledgebase c

between a was appli mino acids o 0.25. How 0 non-redu

f amino ac e scores bet

PFM ether,

e total

prove verage when

amin

The c respe other is cau beco disca score

0 50 100 150 200 250

no acids and

contact cou ectively. It r amino acid

used by the mes very la arded becau e.

Figu

d nucleotide

unts and sco is observed ds. Howeve e fact that A arge, so Arg use it is beli

ure 4-6 Con

es are obtain

res under th d that Argin er, under thi Arginine has ginine gets eved that am

ntact counts

ned by the s

his scheme a nine has lot

is scoring sc s lots of con lower scor mino acids

s between am

same formul

are shown i ts more con cheme, Arg ntacts so tha

res. As a re with more

mino acids

la described

in Figure 4-ntacts with inine has lo at its expect sult, this sc contacts sho

and nucleot

d in section

-6 and Figur nucleotides ower scores

ted contact coring sche ould have h

tides

3.2.

re 4-5 s than . This count me is higher

A T C G

‐0.8

‐0.6

‐0.4

‐0.2 0 0.2 0.4 0.6 0.8 1

Figure 4-7 SScores betwween amino acids and nnucleotides under the oolder scheme

A T C G

Ch

hapter 5

ovel method mary structur

osed metho preference b ein-DNA co

he process eve a better wledgebase Å as distanc e enough to

ber of atom not have m t of the co dues are seld

proposed m lue between plate of quer

a result, the cture-based

ict target se

5 Con

d for predi re is propos od predicts i between am omplex struc

of this stud r performanc of preferen ce cut-off a o DNA mo ms of a conta

much influ ontact residu

dom substit method is a n two prote ry can achie

e proposed methods [5 equences by

nclusion

cting target ed in this th its target se mino acids o

ctures.

dy, differen ce. Differen nce between achieves the lecules can act residue w

ence on the ues of the tuted, it bec applied. Tem ein sequenc

eve a better

method is 5-7]. Howe y structure-b

ns

ts sequence hesis. Given equences ba

of proteins

nt methods nt distance c n amino aci e best perfo n provide u

was conside e performa

template w comes impo

mplate can es, and it tu

performanc

s shown to ever, users based meth

es of DNA n a query pr ased on a kn and nucleo

and param cut-offs wer ids and nuc ormance, te seful inform ered as an i ance of the

were not su ortant to sel

be selecte urns out th ce.

perform w need a str hod. This th

-binding pr rotein prima nowledgeba otides of DN

meters were re impleme cleotides. A elling us th mation. On

mportant is proposed m ubstituted. B lect a prope ed by seque

at using e-v

well when c ructure of q esis provide

roteins base ary structur ase that desc

NA sequenc

implement ented to buil As a result,

hat only res the other ssue. Howev

method, be Because co er template uence identi value to se

compared t query prote es an easier

ed on re, the

cribes ces in

ted to ld the

using sidues hand, ver, it cause ontact when

for u comp vecto SAB cann when

users want pared with or regressio BINE does

not predict b n SABINE c

to know ta SABINE [ on (SVR).

for the val but the prop

cannot pred

arget sequen [8], which

Although t idation set, posed meth dict a given

nces of pro predicts ta the propose , however, hod can. Th protein seq

oteins. The arget sequen

ed method there are p his thesis pr quence.

proposed m nces of pro

cannot pre protein sequ rovides ano

method was otein by su edict as we uences SAB other metho

s also upport

ell as BINE od for

1. factors energies Morozov structura Morozov regulato States of Zhou, Y Berger, high-reso 1266-12 Alleyne, to individ Wolber, Software of Medic Robertso for the p Function Altschul protein 3389-40 Zhang, Y based on Shindyal combina 11(9): p.

Holm, L Distance Kihara, D

k, C., et al ry networks ov, D.A., ry network 97.

u, R., et al.

ry networks 5).

ova, D., P.

using 3D . Bmc Bioin v, A.V., e al models. N v, A.V. and

ry sites. Pro f America, 2 Y.Q., et a DNA threa ption-factor rmatics, 200 r, A., et ption factor

M.F., et a olution an 76.

, T.M., et a dual DNA k

G., et al., e Tools as K

cinal Chemi on, T.A. and prediction of n and Bioinf l, S.F., et a database s 2.

Y. and J. S n the TM-sc lov, I.N. an atorial exten 739-747.

L. and C.

e Matrices.

D. and J. Sk

REF

l., ModuleM s. Biosystem

Comparati ks in bac ., The Infer s from syste Stegmaier structure-ba nformatics,

t al., Pro Nucleic acid

E.D. Siggia oceedings o 2007. 104(1 al., An al ading, dock r binding 09. 76(3): p.

al., Predic rs. PloS one al., Variatio

alysis of l., Predictin k-mers. Bioi

The Prote Key Compon istry, 2008.

d G. Varani, f protein-DN formatics, 2 al., Gapped earch prog kolnick, TM core. Nuclei

nd P.E. Bou nsion (CE) Sander, P Journal of M kolnick, The

FEREN

Master: A ms, 2010. 99 ive genomi

cteria. Che relator: an ems-biology r, and A. K

ased comp 2010. 11: p tein-DNA ds research, a, Connectin of the Nation 17): p. 7068

ll-atom kn king decoy

profiles.

. 718-730.

cting DNA e, 2010. 5(11 on in home sequence p ng the bind

informatics ein Data B nents for In 51(22): p. 7 , An all-atom

NA interact 2007. 66(2):

d BLAST a grams. Nuc M-align: a ic acids rese urne, Prote

of the opti Protein-Stru Molecular B e PDB is a c

NCE

new tool t 9(1): p. 79-8 ic reconstr emical Re

algorithm y data sets Kel, Creatin putation of

p. -.

binding sp 2005. 33(1 ng protein s nal Academ

-73.

owledge-ba y discrimin Proteins-A-binding s

1): p. e1387 eodomain D preferences ding preferen

, 2009. 25(8 ank (PDB)

Silico Guid 7021-7040.

m, distance-tions from s : p. 359-374 and PSI-BLA

cleic acids protein stru earch, 2005.

in structure imal path.

ucture Com Biology, 199 covering set

to decipher 81.

ruction of eviews, 20 for learnin de novo. G ng PWMs protein-DN pecificity p 8): p. 5781-structure wi my of Scienc ased energ nation, and -Structure specificities 76.

DNA bindi s. Cell, 20 nce of trans 8): p.

1012-, Its Relate ded Drug Di -dependent structure. Pr 4.

AST: a new research, 1 ucture align . 33(7): p. 2 e alignment Protein Eng mparison by 93. 233(1):

t of small pr

r transcript f transcript 007. 107(8

ng parsimo Genome Bio

of transcri NA free bin predictions

-98.

ith predictio ces of the U gy function d predictio Function s of eukar

ing reveale 008. 133(7 scription fa -1018.

ed Services iscovery. Jo scoring fun roteins-Stru w generatio 1997. 25(17 nment algo 2302-2309.

t by increm ngineering, by Alignme

p. 123-138 rotein struc

tional tional ): p.

onious ology, iption nding with ons of United n for on of and ryotic ed by 7): p.

actors s and ournal nction ucture on of 7): p.

orithm mental 1998.

nt of . tures.

18.

19.

20.

21.

22.

23.

Journal o Zhang, Y structure 57(4): p.

Matys, transcrip p. D108-Tsai, H.K factor bi Chan, W freely a database Chen, C sequence Recognit Tsai, H.K yeast. Bi

of Molecula Y. and J. Sk e template q . 702-710.

V., et al.

ptional gene -D110.

K., et al., M inding sites W.M. and U accessible,

e. Genetics C.Y., W.C.

e clusters tion, 2006.

K., et al., M ioinformatic

ar Biology, 2 kolnick, Sco

quality. Prot , TRANSFA e regulation MYBS: a co

in yeast. Nu U. Consortiu

comprehen Research, 2 Chung, an for constr 39(12): p. 2 Method for cs, 2006. 22

2003. 334(4 oring functio teins-Struct FAC (R) a n in eukary omprehensiv

ucleic acids um, The Un nsive and 2010. 92(1) nd C.T. Su

ruction of 2356-2369.

r identifying 2(14): p. 16

4): p. 793-8 on for autom ture Functio and its mo otes. Nucle ve web serv s research, 2 niProt Know

expertly : p. 78-79.

u, Exploitin f protein fa

g transcript 75-81.

02.

mated asses on and Bioin

odule TRAN ic acids res ver for mini 2007. 35: p.

wledgebase curated pr ng homogen family hiera tion factor

ssment of pr nformatics, ANSCompel search, 2006

ing transcri . W221-W2 (UniProtK rotein sequ

neity in pr rarchies. Pa binding sit

rotein 2004.

(R):

6. 34:

iption 226.

KB): a uence rotein attern tes in

在文檔中從一級結構預測DNA結合蛋白之標的序列 (頁 44-50)