• 沒有找到結果。

Chapter 2   Method and Materials

2.4  Scoring matrices

group two, n= number of participants.

2.4 Scoring matrices

In this section, we first introduce the residue-based binding model of a protein–DNA complex. According to the binding model, we construct eight knowledge-based scoring matrices by using crystal protein–DNA complexes.

The residue-based binding model takes the interacting amino acid-nucleotide pairs of a protein–DNA complex into consideration. For a given protein–DNA complex, the model is

usually represented as a contact profile which consists of all interaction amino acid-nucleotide pairs. To model the binding affinity of such a contact profile, Mandel-Gutfreund and Margalit proposed a knowledge-based scoring matrix where all possible amino acid-nucleotide pairs (80 pairs) and successfully modeling the binding free energy of zinc finger proteins [22].

We proposed a residue-based binding model by incorporated with two features. First, we model the interaction between side chain (and main chain) of amino acids and base (or backbone) of nucleotides, instead of just considering only side chain-base interaction. Second, we model van der Waals forces, hydrogen bonds, and electrostatic interactions between interaction pairs.

Fugure 1 shows an example of interacting residue–nucleotide pair. A guanine base is making hydrogen bonds to an arginine side chain. There are two contacts of hydrogen atoms on the arginine with oxygen or nitrogen atoms on the major groove edge of the guanine ring.

For an amino acid, the main chain atoms are the same among 20 amino acids and side chain atoms are variable. Similarly, the backbone atoms of a nucleotide (including phosphate backbone and deoxyribose sugar) are the same among four nucleic acids and base atoms are variable. In an amino acid–nucleotide pair, there are four types of interaction considering in our model, including interactions of side chain to base (SS), side chain to backbone (SB), main chain to base (MS), and main chain to backbone (MB).0

For all interaction types (SS, SB, MS, and MB) in an amino acid-nucleotide pair, we

check the van der Waals force, hydrogen bond, and electrostatic interaction in each interaction by satisfying following criteria:

1. van der Waals force: If any heavy atom of X is within a distance (distance 4.5Å) of any heavy atom of Y, where X (main chain atoms, side chain atoms) and Y (base atoms, backbone atoms).

2. Hydrogen bond & electrostatic interaction: If any atom of X is formed a hydrogen bond to any atom of Y, where X (main chain atoms, side chain atoms) and Y (base atoms, backbone atoms) or formed electrostatic interaction. The hydrogen bond and electrostatic interactions were determined by using an open software HBPLUS [23].

Figure 2A shows the protein–DNA complex, CRP, a TF of E. coli (PDB code: 1zrc, helix-turn-helix motif of chain A) [24] and we take it for example to describe our residue-based binding model. For all residues of the protein (chain A) and for all nucleotides of DNA chains (chain W and chain X), we first divide the atoms of the residues into main chain groups and side chain groups (the atoms of the nucleotides are divided into base groups and backbone groups). Based on (a) and (b), we obtain van der Waals pairs of four amino acid-nucleotide interaction types (Vss, Vsb, Vms, and Vmb) and special-force (hydrogen bonding and electrostatic interaction) pairs of four amino acid-nucleotide interaction types (Sss, Ssb, Sms, and Smb). The final contact profile of the protein is shown in Figure 2B.

We select the co-crystallized protein–DNA complexes to be our matrices constructing materials as several criteria list below:

1. Resolution of crystal structures must smaller than 3.0 Å 2. DNA crystallized in complex must be double strand DNA

3. The chain of DNA-binding proteins should comprised more than 50 amino acids 4. The number of interacting residues must contact more than 5

5. We use BLASTCLUST to cluster two protein–DNA complexes as same group when their 70% amino acid sequences coverage share more than 30% sequence identity 6. Select representative proteins as aligned ratio of contact ratio

Finally, we get 349 protein-DNA complexes (listed in Table 1) to be the material for constructing our scoring matrices.

2.4.1 Aligned ratio of contact residue

In order to select representative proteins from each groups clustered by BLASTCLUST, we need to measure the protein which mostly represent of the group. We use an index, aligned ratio of contact residue (CR), for calculate the ratio between the aligned contact residues and total contact residues.

NC CR NAC sidue Conatct Re of

tio

aligned ra ( )

where the NC is the total number of contact residues, NAC is the number of contact residue

aligned in PSI-BLAST alignment.

2.4.2 Knowledge-based scoring matrices

To obtain the scoring matrices, we first generate the frequency tables of eight interaction types (shown in Figure 3). We calculate the log odds (log likelihood ratio) for each amino acid-nucleotide pair to quantitatively measure the interaction. For a amino acid(i)-nucleotide(j) pair of the tables, we obtain a score Sij by

j i

ij

ij p p

S f

 ln 

where fij is the frequency of the ij pair, pi is the background probability of residue i, and pj is the background probability of nucleotide j. We use the probability of 20 amino acids occurring on protein–DNA interface to be the background probability of 20 amino acids.

Same as amino acids, the background probability of 4 nucleotides also uses the probability of 4 nucleotides occurring on protein–DNA interface. Figure 4 shows the final score of eight matrices.

2.4.3 Scoring method

Our knowledge-based scoring method is used to calculate the binding affinity of a protein–DNA complex by following steps. First, we obtain the contact profile of this complex (the detail was described in section 2.4). Second, for all contact pairs in each interaction types,

were obtain the scores of all pairs from the corresponding scoring matrix of each interaction type. Finally, we use the linear combination of the eight interaction scores to show the binding affinity of the protein–DNA pair. The score of binding affinity is defined as follows:

Smb

where w1~w8 denote the weights of each interaction scores. Figure 5 shows a flowchart of calculating the score of protein-DNA complex.

相關文件