re is rede
ber of conta
ough this is performance
larity is still g the metho act residues ence is usu ide significa
6 The fr
le building s of protein ulate the exined from S e frequenci ein-DNA co
eotide amou
escribed in
the numbe equence in in the formu
, efined as th act atoms of
s a nice idea e, either us l 0.71 unde ods describe
s of the te ally kept. T ant improve
requency o
the knowl ns and bas xpected con Swiss-Prot, aies were o omplex stru unts to its fr
section 3.3.
er of contac the templat ula describe
∙ 1 he number o
f position
a, however t sing them r
r the valida ed above. T emplate we Thus slightly
ement.
of amino a
ledgebase w ses of nucl ntact frequenand the freq btained by uctures at f requency ob
.2.
ct atoms of te and PFM ed in section
of contact at .
the two me respectively ation set of This might ere not sub y adjusting
acids and n
which desc leotides, th ncy. Finally quencies of y the conta first. The f btained by tf contact res M built by th
n 3.3.3 is re , ∙ toms which
thods descr y or using t protein-DN be caused b bstituted, so how to use
nucleotide
cribes the p e frequency y the frequenucleotides act counts frequency o the contact
sidues to m he knowled
defined.
1~ , is substitut
ribed above them togeth NA complex
by the fact o the inform e the knowl
es
preferences y of them encies of am s were set to
in the 260 of a type o
counts. The
merging the dgebase tog
∈ , , , ted over the
did not imp her. The av x structures that most o mation of ledgebase c
between a was appli mino acids o 0.25. How 0 non-redu
f amino ac e scores bet
PFM ether,
e total
prove verage when
amin
The c respe other is cau beco disca score
0 50 100 150 200 250
no acids and
contact cou ectively. It r amino acid
used by the mes very la arded becau e.
Figu
d nucleotide
unts and sco is observed ds. Howeve e fact that A arge, so Arg use it is beli
ure 4-6 Con
es are obtain
res under th d that Argin er, under thi Arginine has ginine gets eved that am
ntact counts
ned by the s
his scheme a nine has lot
is scoring sc s lots of con lower scor mino acids
s between am
same formul
are shown i ts more con cheme, Arg ntacts so tha
res. As a re with more
mino acids
la described
in Figure 4-ntacts with inine has lo at its expect sult, this sc contacts sho
and nucleot
d in section
-6 and Figur nucleotides ower scores
ted contact coring sche ould have h
tides
3.2.
re 4-5 s than . This count me is higher
A T C G
F
‐0.8
‐0.6
‐0.4
‐0.2 0 0.2 0.4 0.6 0.8 1
Figure 4-7 SScores betwween amino acids and nnucleotides under the oolder scheme
A T C G
Ch
hapter 5
ovel method mary structur
osed metho preference b ein-DNA co
he process eve a better wledgebase Å as distanc e enough to
ber of atom not have m t of the co dues are seld
proposed m lue between plate of quer
a result, the cture-based
ict target se
5 Con
d for predi re is propos od predicts i between am omplex struc
of this stud r performanc of preferen ce cut-off a o DNA mo ms of a conta
much influ ontact residu
dom substit method is a n two prote ry can achie
e proposed methods [5 equences by
nclusion
cting target ed in this th its target se mino acids o
ctures.
dy, differen ce. Differen nce between achieves the lecules can act residue w
ence on the ues of the tuted, it bec applied. Tem ein sequenc
eve a better
method is 5-7]. Howe y structure-b
ns
ts sequence hesis. Given equences ba
of proteins
nt methods nt distance c n amino aci e best perfo n provide u
was conside e performa
template w comes impo
mplate can es, and it tu
performanc
s shown to ever, users based meth
es of DNA n a query pr ased on a kn and nucleo
and param cut-offs wer ids and nuc ormance, te seful inform ered as an i ance of the
were not su ortant to sel
be selecte urns out th ce.
perform w need a str hod. This th
-binding pr rotein prima nowledgeba otides of DN
meters were re impleme cleotides. A elling us th mation. On
mportant is proposed m ubstituted. B lect a prope ed by seque
at using e-v
well when c ructure of q esis provide
roteins base ary structur ase that desc
NA sequenc
implement ented to buil As a result,
hat only res the other ssue. Howev
method, be Because co er template uence identi value to se
compared t query prote es an easier
ed on re, the
cribes ces in
ted to ld the
using sidues hand, ver, it cause ontact when
for u comp vecto SAB cann when
users want pared with or regressio BINE does
not predict b n SABINE c
to know ta SABINE [ on (SVR).
for the val but the prop
cannot pred
arget sequen [8], which
Although t idation set, posed meth dict a given
nces of pro predicts ta the propose , however, hod can. Th protein seq
oteins. The arget sequen
ed method there are p his thesis pr quence.
proposed m nces of pro
cannot pre protein sequ rovides ano
method was otein by su edict as we uences SAB other metho
s also upport
ell as BINE od for
1. factors energies Morozov structura Morozov regulato States of Zhou, Y Berger, high-reso 1266-12 Alleyne, to individ Wolber, Software of Medic Robertso for the p Function Altschul protein 3389-40 Zhang, Y based on Shindyal combina 11(9): p.
Holm, L Distance Kihara, D
k, C., et al ry networks ov, D.A., ry network 97.
u, R., et al.
ry networks 5).
ova, D., P.
using 3D . Bmc Bioin v, A.V., e al models. N v, A.V. and
ry sites. Pro f America, 2 Y.Q., et a DNA threa ption-factor rmatics, 200 r, A., et ption factor
M.F., et a olution an 76.
, T.M., et a dual DNA k
G., et al., e Tools as K
cinal Chemi on, T.A. and prediction of n and Bioinf l, S.F., et a database s 2.
Y. and J. S n the TM-sc lov, I.N. an atorial exten 739-747.
L. and C.
e Matrices.
D. and J. Sk
REF
l., ModuleM s. Biosystem
Comparati ks in bac ., The Infer s from syste Stegmaier structure-ba nformatics,
t al., Pro Nucleic acid
E.D. Siggia oceedings o 2007. 104(1 al., An al ading, dock r binding 09. 76(3): p.
al., Predic rs. PloS one al., Variatio
alysis of l., Predictin k-mers. Bioi
The Prote Key Compon istry, 2008.
d G. Varani, f protein-DN formatics, 2 al., Gapped earch prog kolnick, TM core. Nuclei
nd P.E. Bou nsion (CE) Sander, P Journal of M kolnick, The
FEREN
Master: A ms, 2010. 99 ive genomi
cteria. Che relator: an ems-biology r, and A. K
ased comp 2010. 11: p tein-DNA ds research, a, Connectin of the Nation 17): p. 7068
ll-atom kn king decoy
profiles.
. 718-730.
cting DNA e, 2010. 5(11 on in home sequence p ng the bind
informatics ein Data B nents for In 51(22): p. 7 , An all-atom
NA interact 2007. 66(2):
d BLAST a grams. Nuc M-align: a ic acids rese urne, Prote
of the opti Protein-Stru Molecular B e PDB is a c
NCE
new tool t 9(1): p. 79-8 ic reconstr emical Re
algorithm y data sets Kel, Creatin putation of
p. -.
binding sp 2005. 33(1 ng protein s nal Academ
-73.
owledge-ba y discrimin Proteins-A-binding s
1): p. e1387 eodomain D preferences ding preferen
, 2009. 25(8 ank (PDB)
Silico Guid 7021-7040.
m, distance-tions from s : p. 359-374 and PSI-BLA
cleic acids protein stru earch, 2005.
in structure imal path.
ucture Com Biology, 199 covering set
to decipher 81.
ruction of eviews, 20 for learnin de novo. G ng PWMs protein-DN pecificity p 8): p. 5781-structure wi my of Scienc ased energ nation, and -Structure specificities 76.
DNA bindi s. Cell, 20 nce of trans 8): p.
1012-, Its Relate ded Drug Di -dependent structure. Pr 4.
AST: a new research, 1 ucture align . 33(7): p. 2 e alignment Protein Eng mparison by 93. 233(1):
t of small pr
r transcript f transcript 007. 107(8
ng parsimo Genome Bio
of transcri NA free bin predictions
-98.
ith predictio ces of the U gy function d predictio Function s of eukar
ing reveale 008. 133(7 scription fa -1018.
ed Services iscovery. Jo scoring fun roteins-Stru w generatio 1997. 25(17 nment algo 2302-2309.
t by increm ngineering, by Alignme
p. 123-138 rotein struc
tional tional ): p.
onious ology, iption nding with ons of United n for on of and ryotic ed by 7): p.
actors s and ournal nction ucture on of 7): p.
orithm mental 1998.
nt of . tures.
18.
19.
20.
21.
22.
23.
Journal o Zhang, Y structure 57(4): p.
Matys, transcrip p. D108-Tsai, H.K factor bi Chan, W freely a database Chen, C sequence Recognit Tsai, H.K yeast. Bi
of Molecula Y. and J. Sk e template q . 702-710.
V., et al.
ptional gene -D110.
K., et al., M inding sites W.M. and U accessible,
e. Genetics C.Y., W.C.
e clusters tion, 2006.
K., et al., M ioinformatic
ar Biology, 2 kolnick, Sco
quality. Prot , TRANSFA e regulation MYBS: a co
in yeast. Nu U. Consortiu
comprehen Research, 2 Chung, an for constr 39(12): p. 2 Method for cs, 2006. 22
2003. 334(4 oring functio teins-Struct FAC (R) a n in eukary omprehensiv
ucleic acids um, The Un nsive and 2010. 92(1) nd C.T. Su
ruction of 2356-2369.
r identifying 2(14): p. 16
4): p. 793-8 on for autom ture Functio and its mo otes. Nucle ve web serv s research, 2 niProt Know
expertly : p. 78-79.
u, Exploitin f protein fa
g transcript 75-81.
02.
mated asses on and Bioin
odule TRAN ic acids res ver for mini 2007. 35: p.
wledgebase curated pr ng homogen family hiera tion factor
ssment of pr nformatics, ANSCompel search, 2006
ing transcri . W221-W2 (UniProtK rotein sequ
neity in pr rarchies. Pa binding sit
rotein 2004.
(R):
6. 34:
iption 226.
KB): a uence rotein attern tes in