Predicting probability density maps (PDM) of non-covalent interacting

Chapter 2 Methods

2.1 Constructing three-dimensional probability density maps (PDMs) for

2.1.3 Predicting probability density maps (PDM) of non-covalent interacting

A probability density map (PDM) of a non-covalent interacting atom type is a three-dimensional distribution of likelihood for the type of atom to appear around protein surface amino acids. In this work, the PDMs were reconstructed from the interacting atom pair databases described in the previous section for the 31 interacting atom types shown in Table 1.

To construct a PDM for an interacting atom type on a target protein surface, the computer algorithm first enclosed the target protein structure in a rectangular box clearing the structure by a margin of at least 7 Å from all sides of the protein’s edge. The three-dimensional rectangular box was then gridded with 0.5 Å per unit in three-dimensional space. This grid size was a balance between the resolution of the PDM and the computational resources needed for the PDM construction. The grid points enclosed within the Connolly surface [47] of the target protein were masked from assigning PDM.

The torsion angles of sidechain and mainchain of all the amino acids in the protein structure were calculated with MOLMAN2 and DSSP respectively. For each of the amino acid residues in the protein, the conformational type of the amino acid X was determined by the torsion angle vector, which had the least Euclidean distance to the centroid conformation of the assigned conformational cluster. With the assignment of the conformational type for each of the amino acids in the protein structure, the non-covalent

interacting atoms around each atom P in the protein structure were allocated from the database according to the atom type of P, the assigned three-atom reference system P-R-Q as described in the previous section, the amino acid type of the parent residue containing atom P, and the conformational type of the parent amino acid. Interacting atoms outside the sphere with the radius equal to the sum of the van der Waals radii of the interacting atom and atom P plus a tolerance of 0.5 Å were not included as the interacting atoms with atom P. The coordinates of the allocated interacting atoms were transformed to the coordination system of the protein structure and mapped around the protein surface. An atom of non-covalent interaction was to be mapped only once for which the distance of the atom to P was the shortest. 31 PDMs were constructed from all the interacting atoms allocated for all the protein atoms (30 atom types) in the protein structure.

In order to keep PDMs high in information content and low in noise from irrelevant interactions, two strategies have been implemented. First, allocation of interacting atoms according to the amino acid conformational type (as described above) is crucial for retaining information content in PDMs. Alternative approach for PDM construction with interacting atoms allocated from mixed amino acid conformational types would lead to loss of fidelity in relative orientations of the interacting atoms, resulting in spreading PDMs around dihedral bonds. We found that mapping interacting atoms obtained from an atom in an amino acid conformational type onto the surroundings of the atom in another amino acid conformational type led to serious spatial distortion of the distribution of the interacting atoms. Second, only interacting atomic pairs in the database are used for PDM constructions. Atom pairs in the database were recorded by a threshold of distance in

proximity. But frequently, many of such distributions of proximal atom pairs are results of covalent structures of non-interaction pairs in a nevertheless stable structure. In this work, non-interacting atomic pairs were eliminated with a filter Table as shown in Table 2 [1].

Only the atomic pairs with the value in the matrix of the Table less than -0.1 were considered as interacting pairs and only these interacting atoms were included in the PDM constructions.

PDMs were constructed by mapping the interacting atoms allocated from the database as described in the previous paragraphs to the 3D grid system. To construct the PDM, each of the interacting atoms was distributed to 8 nearest grid points; the portion of the distribution was normalized by the database redundancy and was inversely proportional to the square of the distance from the atom to the grid:





, where vji is the value to be accumulated at a nearest grid point j for interacting atom i; dji

is the distance of grid point j to the center of the interacting atom i; grid points indexed k=1~8 are the nearest grids to the atom i; n is the number of residues collected in the database for the amino acid in the target protein with the conformational type defined by the torsion angle vector; pi is the background probability for atom type i to appear in all protein structures (when calculating water oxygen PDM, p_i equals to 1). The factor 1/n in the Equation is to normalize the interacting atom density according to one conformation for each of the residues in the target protein and the background probability p_i is to

normalize the PDM based on the appearance frequency of the atom type i in proteins (except for water oxygen). The PDM for each of the interacting atom types was additively accumulated to completion as each of the atoms in the target protein surface finished contributing to the PDMs.

PDMs constructed for 31 interacting atomic types on the surface of 20 natural amino acids and their various conformations are displayed online:

http://ismblab.genomics.sinica.edu.tw/introduction/diaa/. Figure 1 shows a set of PDMs on the example protein surface.

2 – A filter system used to eliminate non-interacting atomic pairs based on the work by McConkey et al. [1] with modifications. During the ction of the PDMs, only the atom pairs with the matrix value less than -0.1 were included in the probability density maps. The atom pairs for the matrix value colored in red were not included for PDM constructions.

Contour cutoff = 0.0005

Figure 1 – Probability density maps and encoded features of human vascular endothelial growth factor A (VEGF). Structure of VEGF is extracted from PDB ID 2FJG chain V and W. Number 1 to 31 in each cell of the table corresponds to each of the interacting atom types defined in Table 1 of the main text. The PDMs are shown in contours colored according to the interacting atom type: cyan for nitrogen, black for carbon, and magenta for oxygen. The contour level is set to 0.0005. Color spectrum of protein atoms in each cell are based on the corresponding ai,j values. Solvent inaccessible atoms are colored in gray. Interactive 3-D graphic presentation of the PDMs can be viewed from the web server http://ismblab.genomics.sinica.edu.tw/ >

gallery.

2.2 Machine learning for probability density maps

在文檔中使用蛋白質表面三度空間的交互作用原子機率分布以預測蛋白質-蛋白質交互作用區域 (頁 23-32)