中華大學

(1)

中華大學

碩士論文

預測蛋白質之間的交互作用與蛋白質四級結構

Predict Protein-Protein Interactions and Protein Quaternary Structure

系所別：資訊工程研究所學號姓名： M09302063 劉智偉

指導教授：許文龍博士

中華民國九十六年八月

(2)

預測蛋白質之間的交互作用與蛋白質四級結構中文摘要

在蛋白質的研究領域中，蛋白質之間的交互作用是非常重要的，可藉由找出蛋白質中會参與交互作用的片段，應用在新藥的開發上。因此發展出一套方法來預測蛋白質中會参與交互作用的片段，是有其重要性的。將兩個會產生交互作用的蛋白質做三級結構的接合，比較其契合程度，找出其四級結構，對於使用電腦輔助開發藥物上有很大的幫助。

本篇的方法是利用蛋白質序列及其二級結構來建造出一個數學模型，接著利用已知的蛋白質資料訓練模型，再將所得的機率值拿來做為預測時的依據。蛋白質資料則是從 Protein Data Bank(PDB)取得。將取得的資料，以我們的方法來做實驗，準確率為 80%。

在四級結構方面，我們以基因演算法改寫 FTDock 與 RPDock 程式並利用先前已訓練好的模型來當作我們的適應函數以取代 Multidock,做更精確的結構契合運算。

i

(3)

Predict Protein-Protein Interactions and Protein Quaternary Structure

ABSTRACT

In the research of protein, the proteins’ interaction is important. New drug can be discovered by finding the intartcion sites in a protein. Therefore, it’s important to develop an efficient method for predicting interaction sites. Docking result between two proteins has great contribution in the computer aided drug discovery.

Protein sequence and its secondary structure are used to build a mathematical model which is trained by a protein dataset. This model can further be used to predict protein interaction sites. All protein data files are retrieved from Protein Data Bank (PDB). The accuracy of this approach can reach to 80%.

In order to decrease dock angle size and increase time complexity, source codes of FTDock and RPDock are modified by using genetic algorithm. The mathematical model developed in this dissertation can be treated as fitness function. This approach can readily be used to replace MultiDock.

ii

(4)

致謝

本論文能順利的完成首先要感謝恩師許文龍教授悉心指導與督促。在就讀研究所兩年期間，在學術研究及學養方面，不論是研究方向的確立、研究的過程及撰寫定稿，皆蒙許教授細心的指導與叮嚀，才能使得本論文能完成。

此外，感謝江維國博士、侯玉松博士，在研究上的指導與建言，使得論文的架構及內容更加充實

還要感謝我的家人，由於你們的全力支持，使我能夠專心在研究上作努力，

無後顧之憂，今日才得以完成碩士學位，僅能以此論文的完成來感謝你們為我所做的一切。

iii

(5)

List of Figures

Figure 1.1 Our docking approach………...3

Figure 2.1 Flow diagram of overall docking method………...8

Figure 2.2 Grid discretisation of molecules and calculation of surface complementarity………...9

Figure 2.3 Matrix generated from 90 non-homologous interfaces………...10

Figure 3.1 PDB protein primary structure files………12

Figure 3.2 PDB protein secondary structure files...12

Figure 3.3 PDB protein primary structure files in database...13

Figure 3.4 PDB protein secondary structure files in database...13

Figure 3.5 Interaction file...14

Figure 3.6 Non-interaction file...15

Figure 3.7 Interaction datasets in database...16

Figure 3.8 Initial statistic model...17

Figure 3.9 Steps of input 1D sequences...18

Figure 3.10 Steps of input 2D sequences...18

Figure 3.11 Test datasets...19

Figure 3.12 The logistic probabilities of a protein’s interaction sites...20

Figure 3.13 : The ratio of row1 to row2...20

Figure 4.1: The discrete functions of two molecules...22

Figure 4.2 : The gene vector...22

Figure 4.3 : The flow diagram of our approach...23

vi

(8)

vii

List of Tables

Table 4.1 The result of 3D-Dock...24 Table 4.2 The result of our approach...25

(9)

Chapert 1 Introduction

1-1 Protein-Protein Interaction

Protein-protein interactions play an important role in protein function.

Identification of protein-protein interaction sites and detection of specific amino acid residues that participate in protein interactions is an important problem ranging from rational drug design to analysis of metabolic and signal transduction networks.

Because the number of experimentally determined structures of protein-protein complexes is small, computational methods for identifying amino acids that participate in protein-protein interactions are becoming increasingly important [1] [2].

Completion of many genomes is being followed rapidly by large-scale efforts to identify interacting protein pairs experimentally, in order to explain the networks of interacting proteins. Experimental proteomics projects have already resulted in complete ‘interactomes’ [3] [4] [5]. While such efforts yield a catalog of interacting proteins, experimental detection of residues in protein-protein interaction surfaces must come from determination of the structure of protein-protein complexes.

However, determination of protein-complex structures using X-ray and NMR methods lags far behind the number of known protein sequences. Hence, there is a need for the development of reliable computational methods for identifying protein-protein interaction residues [6] [7] [8].

- 1 -

(10)

1-2 Protein-Protein Docking

Many cellular events involve the formation of protein–protein complexes.

Elucidation of the structural details of these complexes will undoubtedly contribute to our understanding of their functional properties, and thus is a major goal of structural biology [9] [10] [11] [12]. However, only a small fraction of experimentally determined structures are of protein–protein complexes [13]. Therefore, it is of substantial interest to develop computational docking methods that, given the structures of the individual component proteins, are able to assemble them into the complex in an accurate and reliable way.

Since the early 1990s, docking programs have been able to regenerate near-native structures of protein–protein complexes using the complexed (bound) conformations of the two proteins [14] [15] [16] [17]. The protein docking problem only becomes acute when docking the uncomplexed(unbound) conformations of the two proteins; these are the relevant states for true prediction [18] [19] [20] [21] [22]

[23]. Although unbound proteins often adopt main-chain conformations similar to their bound counterparts, their solvent-exposed side chains commonly adopt conformations that are not complementary to their binding partner [24]. Thus, when docking algorithms generate nearnative complexes from the unbound conformations of the partners, atoms in the interface clash. Such near-native fits score poorly in classical, atomistic energy potentials because of these clashing atoms. Even a single noncomplementary atom can lead to very unfavorable energies because of the steepness of the steric repulsion term of the van der Waals energy [25].

- 2 -

(11)

1-3 The Overview of Our Approach

setup interaction database

modified FTDock

calculate fitness

choose superior

Loop 3000 generation mutation crossover

find protein interaction sites train statistic model

Figure 1.1 : Our docking approach.

Our research includes two parts: interaction and docking which are shown in Figure 1.1. Because biologists believe that the protein interaction is related to protein function [26], therefore predicting interaction is important. All protein data files are collected from Protein Data Bank （ http://www.rcsb.org/pdb/home/home.do ）.

- 3 -

(12)

Protein primary structure and secondary structure are used to determine protein interaction in our approach. Protein interaction dataset collected from Yan’s website

（ http://www.cs.iastate.edu/~yan330/p-p/p-p.htm ） are used to train and test our statistic model. This model can achieve 80% accuracy.

After predicting protein interaction site, we use genetic algorithm to modify FTDock [27] source code. Our statistic model can replace MultiDock [29] and can be treated as fitness function.

1-4 Dissertation Organization

Research related background knowledge for predicting protein-protein interactions and protein quaternary structure is given in Chapter 2. The approach to predict protein interaction sites is described and evaluated in Chapter 3. Then the approach to predict protein quaternary structure is given and evaluated in Chapter 4.

Finally, the conclusion and future work are discussed in Chapter 5.

- 4 -

(13)

Chapter 2 Background

2-1 Introduce Protein Interaction

In the study, different aspects of interaction sites such as hydrophobicity, residue propensities, size, shape, solvent accessibility, and residue pairing preferences, have been examined. Although each of these parameters provides some information indicative of protein interaction sites, none of them perfectly differentiates the interface from the protein surface.

Based on different characteristics of known protein-protein interaction sites, several methods have been proposed for predicting interface residues using a combination of protein sequence and structural information. For example, based on their observation that proline residues occur frequently near interfaces, Kini and Evans [30] predicted potential protein-protein interaction sites by detecting the presence of ‘‘proline brackets’’. Using this strategy, they identified the interaction sites between fibrinogen and 9E9, a monoclonal antibody which inhibits fibrin polymerization. Building on their systematic patch analysis of interaction sites, Jones and Thornton [31][32] successfully predicted interfaces in a set of 59 structures using a scoring function based on six parameters: solvation potential, residues interface propensity, hydrophobicity, planarity, protrusion, and accessible surface area. Gallet et al. [33] identified interacting residues using an analysis of sequence hydrophobicity based on a method previously developed by Eisenberg et al. [34] for detecting membrane and surface segments of proteins. Lu et al. [35] have developed statistical potentials for interfaces and used them in a structure-based multimeric threading

- 5 -

(14)

algorithm to assign quaternary structures and predict protein interaction partners for proteins in the yeast genome.

Several groups have used neural networks to predict protein-protein interaction sites. Zhou and Shan [36] and Fariselli et al. [37] have independently used neural network algorithms to predict whether or not a residue is located in an interaction site using the spatial neighbors of the target residues as input, and achieved accuracy of 70% and 73%, respectively. Ofran and Rost [38] have successfully predicted protein-protein interaction sites using a neural network method based on their observations that the majority of protein-protein interaction residues are clustered on a sequence and that the protein-protein interfaces differ from the rest of the protein surface in residue composition.

2-2 The Software Suite of 3D-Dock

The growing number of individual protein structures in the databases and the relatively small number of solved complexes makes predictive docking an important theoretical method. 3D-Dock is a suite of programs designed to enable computational prediction of protein-protein docking. FTDock, RPScore and MultiDock are the actual software of 3D-Dock. A schematic of the overall approach is shown in Figure 2.1.

The FTDock algorithm is based on that of Katchalski-Katzir [39]. It discretises the two molecules onto orthogonal grids and performs a global scan of translational and rotational space. In order to scan rotational space it is necessary to rediscretise one of the molecules (for speed the smaller) for each rotation. The scoring method is primarily a surface complementarity score between the two grids, and this is shown in Figure 2.2. To speed up the surface complementarity calculations, which are convolutions of two grids, Fourier Transforms are used. This means that the

- 6 -

(15)

convolutions are replaced with multiplications in Fourier space, and despite having to perform the forward and reverse Fourier Transforms, this decreases the overall computation required. The surface complementarity was the only score used in the original method. The original work on FTDock by Gabb [40] found it a useful addition to include an electrostatic filter, and this is again implemented in the current version (though it can be turned off)

The RPScore program uses an empirical pair potential matrix to score each possible complex. The pair potentials are at a amino acid residue level. Each potential corresponds to the empirically derived likelihood of a trans-interface pair of two residue types, limited only by a distance cut off [41]. The present most useful matrix is generated from 90 non-homologous interfaces found in the PDB with the aid of SCOP 1.53 (http://scop.mrclmb.cam.ac.uk/scop/), and is shown graphically in Figure 2.3. If two interfaces are described as pairings of domains A-B and C-D, then a non-homologous interface is defined as being when either A and C, or B and D, are homologous, but not both. Homology is in this case defined as being in the same

`Superfamily' in the SCOP classification tree.

The biological filter is a simple program to screen the complexes by requiring them to have a given chain or residue on one side of the interface within a certain distance of another chain or residue on the other side.

The program MultiDock was developed to provide a method for refining the interface between two proteins at the atomic level given an initial docked complex generated by a docking algorithm or manual docking procedure. The motivation for this work was to provide a rapid energy refinement protocol for the large number of putative docked complexes produced by rigid-body docking programs such as FTDock or DOCK. The program models the effects of side-chain conformational change and the rigid-body movement of the interacting proteins during refinement.

- 7 -

(16)

The protein is described at the atomic level by electrostatic and van der Waals interactions in which the sidechains are modeled by a multiple copy representation.

Figure 2.1 : Flow diagram of overall docking method

- 8 -

(17)

According to a rotamer library on a fixed peptide backbone. MultiDock implements the self consistent Mean Field Optimisation procedure (Koehl & Delarue 1994) but using a full conventional molecular mechanics force field with scaling of van der Waals and electrostatic interactions for unrealistically close atomic contacts.

MultiDock also implements a rigid-body energy minimization routine which is performed to relax the interface. MultiDock has been tested by performing refinement of large numbers of structures generated by FTDock on several protein-protein systems. The results are encouraging for Protein-Inhibitor systems. For a detailed description of the program and application to protein-protein systems see Jackson et al.

1998.

Figure 2.2 : Grid discretisation of molecules and calculation of surface complementarity

- 9 -

(18)

Figure 2.3 : Matrix generated from 90 non-homologous interfaces

2-3 Genetic Algorithm

Genetic algorithms and evolutionary programming are quite suitable for solving docking problems because of their usefulness in solving complex optimization problems. The essential idea of genetic algorithms is the evolution of a population of possible solutions via genetic operators (mutation, crossovers and migrations) to a final population, optimizing a predefined fitness function. The process of applying genetic algorithms starts with encoding the variables, in this case the degrees of freedom, into a "genetic code", e.g. binary strings. Then a random initial population of solutions is created. Genetic operators are then applied to this population leading to a new population. This new population is then scored and ranked, and using "the

- 10 -

(19)

survival of the fittest", their probabilities of getting to the next iteration round depends on their score. If the size of the population is kept constant, good solutions will occupy the population. It should be noted that genetic algorithms are well suitable for parallel computing. Some programs using GAs are GOLD, AutoDock, DIVALI and DARWIN.

Genetic Algorithm:

begin genetic algorithm;

generate an initial population;

do

{

calculate fitness values;

perform reproduction;

{

selecting individuals for parent;

parent crossover;

}

replace pairs;

apply mutation;

} while stopping criteria not met;

end genetic algorithm.

- 11 -

(20)

Chapter 3 Predict Protein Interaction Sites

3-1 Setup Protein Primary and Secondary Structure Database

The Protein Data Bank (PDB) is the single worldwide depository of information about the three-dimensional structures of large biological molecules, including proteins and nucleic acids. These are the molecules of life that are found in all organisms including bacteria, yeast, plants, flies, and mice, and in healthy as well as diseased humans. We download all protein primary and secondary structure files from PDB. These files are all about sequence.

Figure 3.1 : PDB protein primary structure files

Figure 3.2 : PDB protein secondary structure files

In order to use database to store these files, we need to deal with these files by Perl. After that, these files were converted to the format we can upload to our MySQL database.

- 12 -

(21)

Figure 3.3 : PDB protein primary structure files in database

Figure 3.4 : PDB protein secondary structure files in database

- 13 -

(22)

3-2 Interaction Datasets

After setting up protein primary and secondary structure database, we should set up interaction database. We adopt 77 protein data records which had been found interaction sites by Yan, C. [42] . We download his interaction datasets. His datasets include interaction and non-interaction file.

Figure 3.5 : Interaction file

- 14 -

(23)

Figure 3.6 : Non-interaction file

We also convert these files to the format that can upload to our database. Because the interaction datasets only record primary structure, we use 1D’s interaction sites to find 2D’s interaction sites by writing a PHP program. So do the non-interaction data sets. And now we have interaction sites and non-interaction sites table which include primary and secondary sequences in our database. 90% interactions datasets are used to train our mathematic statistic model, 10% interactions datasets are used to test statistic model.

- 15 -

(24)

Figure 3.7 : Interaction datasets in database

3-3 Training Statistic Model

Figure 3.8 is our initial statistic model. Because of the interaction datasets from Yan, C. are just 77 protein interaction datasets. There are more than 77 protein interaction datasets in the world. In the future, we will download all the interaction datasets from INTERPARE ( http://www.interpare.net ) and create a protein interaction database. Then we will use the interaction database to train more complicated statistic model.

- 16 -

(25)

Figure 3.8 : Initial statistic model

And now, we will train the two models with the 67 protein datasets (90% of 77 protein datasets). Every interaction site is spilt to nine amino acid of a division. When we input a division of interaction site, the score of route and the amino acid of a block will be added by ONE_PRI and SEC_PRI. If we input VTAEAKKEN and SCCCCHHHH, the score will be added by this way [Figure 3.9] [Figure 3.10]. So we can calculate the probability of every route and amino acid.

- 17 -

(26)

Figure 3.9 : Steps of input 1D sequences

Figure 3.10 : Steps of input 2D sequences

- 18 -

(27)

3-4 Test Statistic Model

There are 10 protein datasets (10% of 77 protein datasets) are used to test our statistic model. After training our model, we have all the logistic probability of every route and amino acid in a block. We input our test sets to test if our statistic model is good or not. First we load test datasets to our database like training datasets in the same way [Figure 3.11].

Figure 3.11: Test datasets

Every division of test sets is input to our interaction model and non-interaction model. All kind of logistic probability (all route and amino acid in a block) that test sets match are summed. So we have input interaction test sets in our interaction model

- 19 -

(28)

and non-interaction model.

3-5 Result

After we input test sets to our interaction model and non-interaction model, we can get the logistic probabilities of a protein’s interaction sites that produce by our model. [Figure 3.12]

Figure 3.12 : The logistic probabilities of a protein’s interaction sites

Row1 is the logistic probabilities that interaction test datasets produced by our interaction model. And row2 is the logistic probabilities that interaction test datasets produced by our non-interaction model. In order to test our statistic model’s performance of prediction, we calculate the ratio of row1 to row2. [Figure 3.13]

Figure 3.13 : The ratio of row1 to row2

In row3, we can get the number of the ratio of row1 to row2. If the ratio is bigger than 1 or equal to 1, it means false-positive (FP). Otherwise it means true-positive (TP). Then we can calculate how many the true-positive and how many the

false-positive. So we can get the result by the equation:

p p

p

F T

T

+ . This equation shows what our interaction model’s accuracy is. And the accuracy is 188/234=80.3%

- 20 -

(29)

Chapter 4 Predict Protein Quaternary Structure

4-1 FTDock

After building Mathematical model to predict protein interacting site, protein quaternary structure can be further predicted in next step. The FTDock( Fourier Transform Dock ) Of 3D-Dock is adopted in our approach . FTDock performs rigid-body docking on two biomolecules in order to predict their correct binding geometry. One is a large molecule, the other is a small molecule docked against a large molecule ( the two molecules denoted by a and b ). Each molecule is placed into a grid of dimension(128*128*128). It shows that a digital representation of the molecules( derived fromatomic coordinates ) by three-dimensional discrete function that distinguishes between the surface and the interior. [Figure 4.1]

and

Figure 4.1: The discrete functions of two molecules

The algorithm of FTDock can evaluate shape complementarity. In FTDock, matching of surfaces is accomplished by calculating correlation functions. The correlation between the discrete functions a and b is defined as

∑∑∑

= = = ⋅ + + +

= ^N

l N

m N

n

n m l m

l b

a c

1 1 1 , , , ,

,

,βγ γ α β γ

α

Where α, β and γ are the number of grid steps by which molecule b is shifted

- 21 -

(30)

with respect to molecule a in each dimension.

4-2 Modified FTDock

Depending on the size of the molecules, a typical docking takes 9 to 36 hours, although it decrease much calculating time by the fastest fourier transform. In order to calculate more precision docking angles, we think FTDock can be modified by our method. We use genetic algorithm to implement our modified FTDock. Because Multi-Dock involve in some complex physical and chemical methods, we replace Multi-Dock with our modified FTDock. Here is our fitness function( using knowledge based energy )

∑

⁺

∑

ave mov ave

fix

P P P

P

The flow diagram of our approach is shown in Figure 4.3. The processing steps are discussed below:

1. We construct an initial population using three rotating angles as gene vector exhibited in Figure 4.2. The population have 84 individuals and every individual has 26 bit-vector which is divided into three angles

Figure 4.2 : The gene vector 2. Random choose two individual as parent.

3. Randomly choose one angle from three angles 4. Process the parental crossover

5. Calculate fitness functions of two children.

6. Use fitness function to produce superior descendants.

- 22 -

(31)

7. Randomly select an individual to mutate and choose superior child.

8. Repeat same process for 3000 generations

Figure 4.3 : The flow diagram of our approach

4-3 Performance Evaluation

Below is a table showing the results attained by using the whole of the 3D-Dock suite; i.e. running FTDock, followed by RPScore, a biological filter (where available), and finally MultiDock. Ranks for first correct docking where structure is less than 3 Angstroms RMSD over all C-alpha atoms from crystal.

- 23 -

(32)

Table 4.1: The result of 3D-Dock

Test Surface

Complementarity

Pair

Potential + Filtering + MultiDock

System Rank RMS Rank RMS Rank RMS Rank RMS

RMS of rank 1 1BRC 206 1.1 1 2.8 1 2.8 3 1.4 7.6 1CGI 89 2.7 2 2.8 1 2.8 92 2.7 6.9 2KAI 31 2.5 106 1.4 21 1.3 15 1.3 6.9 2SIC 1489 2.8 82 1.6 6 2.7 1 1.1 1.1 1BVK 368 1.6 272 2.2 250 2.2 208 3.0 10.6 1MLC 2375 2.6 182 1.8 125 1.8 326 3.0 14.1 1AHW 240 2.4 135 2.4 123 2.4 1 2.4 2.4 1WEJ 200 2.4 27 2.2 27 2.2 36 2.0 10.3 1BGS 542 2.9 3 2.1 3 2.1 72 1.5 10.7 1BDJ 4901 2.6 2661 2.4 2661 2.4 - - 13.7 1DFJ 3 3.0 2956 2.6 941 2.6 - - 16.7 1UGH 118 2.8 474 2.5 60 1.8 7 1.8 3.2 1WQ1 769 2.7 1941 2.8 1941 2.8 - - 13.8 2PCC 1328 2.3 592 2.1 592 2.1 - - 23.2

In our method, we use the proteins from 3D Dock. Te result by our modified FTDock.

- 24 -

(33)

Table 4.2 : The result of our approach Test Modified FTDock System Rank RMS

1BRC 1 23.7252857492 1CGI 1 59.6277349715 2KAI 1 10.6992020544 2SIC 1 5.71858633602 1BVK 1 59.0774992181 1MLC 1 43.6301266127 1AHW 1 19.0362683454 1WEJ 1 20.4132084495 1BGS 1 24.8226209237 1BDJ 1 30.3130366742 1DFJ 1 19.8909293093 1UGH 1 20.5796791415 1WQ1 1 4.40665509794 2PCC 1 17.721368144

We can compare the results of 3D Dock and our method. The results of two proteins are better than 3D-Dock(1WQ1 and 2PCC). Mobile protein rotates only nine thousand times in our method in order to compare with 3D-Dock. Only two results better than 3D Dock. Typical modified FTDock take two days to a week by our method when we tested these proteins. If the rotation number is increased in our method, the result should be better.

- 25 -

(34)

Chapter 5 Conclusion and Future Works

We have developed a method to predict protein interaction sites. Our algorithm, a Hidden Markov Model, has been built to solve our problem with statistical concept.

We take advantage of the computer to deal with a large amount of biological data quickly. There are several methods to predict protein interaction sites. But there are many factors of protein needed to be considered. Our mathematical method is trained to get all parameter’s probability. We can use our method to predict protein interaction sites. The average accuracy of our method is above 80%. After that, we use our training model to develop our modified FTDock. We replace Multi-Dock with our modified FTDock.

In order to get better results of our program, further research works can be conducted in the future:

1. FTDock and RPScore could be rewrite into procedures to reduce disk access time.

2. Both positive model and negative model should be applied in fitness function.

3. To increase docking flexibility and accuracy, dynamic protein 3D structure must be adopted.

4. To improve the computation time, the modified dynamic 3D docking program can be implemented in PC cluster environment.

- 26 -

(35)

Reference

[1] Valencia A., Pazos F., (2002). “Computational methods for prediction of protein interactions”, Curr Opin Struc Biol 12:368-373.

[2] Teichmann S. A., Murzin AG, Chothia C., (2001). “Determination of protein function, evolution and interactions by structural genomics”, Curr Opin Struc Biol 11:354-363.

[3] Ho,Y., Gruhler,A., Heilbut,A., Bader,G.D., Moore,L., Adams,S., Millar,A., Taylor,P., Bennett,K., Boutilier,K. et al., (2002). “Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry”, Nature, 415, 180–183.

[4] Giot,L., Bader,J.S., Brouwer,C., Chaudhuri,A., Kuang,B., Li,Y., Hao,Y.L., Ooi,C.E., Godwin,B., Vitols,E. et al., (2003). “A protein interaction map of Drosophila melanogaster. Science”, 302, 1727–1736.

[5] Li,S., Armstrong,C.M., Bertin,N., Ge,H., Milstein,S., Boxem,M., Vidalain,P., Han,J.J., Chesneau,A., Hao,T. et al., (2004). “A map of the interactome network of the metazoan C”, elegans. Science,303, 540–543.

[6] Teichmann,S.A., Murzin,A.G. and Chothia,C., (2001). “Determinationof protein function, evolution and interactions by structural genomics”, Curr. Opin. Struct.

Biol., 11, 354–363

[7] Valencia,A. and Pazos,F., (2002). “Computational methods for prediction of protein interactions”, Curr. Opin. Struct. Biol., 12,368–373.

[8] Valencia,A. and Pazos,F., (2003). “Prediction of protein–protein interactions from evolutionary information”, In Bourne,P.E. and Weissig,H. (eds), Structural Bioinformatics. Wiley Inc., pp. 411– 426.

[9] Camacho, C.J. and Vajda, S., (2002). “Protein–protein association kinetics and

- 27 -

(36)

protein docking”, Curr. Opin. Struct. Biol. 12: 36–40.

[10] Halperin, I., Ma, B., Wolfson, H., and Nussinov, R., (2002). “Principles of docking: An overview of search algorithms and a guide to scoring functions”, Proteins 47: 409–443.

[11] Smith, G.R. and Sternberg, M.J., (2002). “Prediction of protein–protein interactions by docking methods. Curr”, Opin. Struct. Biol. 12: 28–35.

[12] Vajda, S. and Camacho, C.J., (2004). “Protein–protein docking: Is the glass halffull or half-empty?”, Trends Biotechnol. 22: 110–116.

[13] Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., and Bourne, P.E., (2000). “The Protein Data Bank”, Nucleic Acids Res. 28: 235–242.

[14] Cherfils, J., Duquerroy, S., and Janin, J., (1991). “Protein–protein recognition analyzed by docking simulation”, Proteins 11: 271–280.

[15] Shoichet, B.K. and Kuntz, I.D., (1991). “Protein docking and complementarity”, J. Mol. Biol. 221: 327–346.

[16] Hart, T.N. and Read, R.J., (1992). “A multiple-start Monte Carlo docking method”, Proteins 13: 206–222.

[17] Vakser, I.A., (1995). “Protein docking for low-resolution structures”, Protein Eng. 8: 371–377.

[18] Totrov, M. and Abagyan, R., (1994). “Detailed ab initio prediction of lysozyme–antibody complex with 1.6 A accuracy”, Nat. Struct. Biol. 1:

259–263.

[19] Vakser, I.A., Matar, O.G., and Lam, C.F., (1999). “A systematic study of lowresolution recognition in protein–protein complexes”, Proc. Natl. Acad. Sci.

96: 8477–8482.

[20] Jackson, R.M., Gabb, H.A., and Sternberg, M.J., (1998). “Rapid refinement of

- 28 -

(37)

[21] Norel, R., Petrey, D., Wolfson, H.J., and Nussinov, R., (1999). “Examination of shape complementarity in docking of unbound proteins”, Proteins 36: 307–317.

[22] Camacho, C.J., Gatchell, D.W., Kimura, S.R., and Vajda, S., (2000). “Scoring docked conformations generated by rigid-body protein–protein docking”, Proteins 40: 525–537.

[23] Kimura, S.R., Brower, R.C., Vajda, S., and Camacho, C.J., (2001). “Dynamical view of the positions of key side chains in protein–protein recognition”, Biophys.

J. 80: 635–642.

[24] Conte, L.L., Chothia, C., and Janin, J., (1999). “The atomic structure of protein–protein recognition sites”, J. Mol. Biol. 285: 2177–2198.

[25] Weiner, S.J., Kollman, P.A., Case, D.A., Singh, U.C., Ghio, C., Alagona, G., Profeta, S., and Weiner, P., (1984). “A new force field for molecular mechanical simulation of nucleic acids and proteins”, J. Am. Chem. Soc. 106: 765–784.

[26] Samanta, M. P. and Liang, S., (2003). “Predicting protein functions from redundancies in large-scale protein interaction networks”, PNAS, 100, 22, 12579-12583.

[27] Henry A. Gabb, Richard M. Jackson, Michael J. E. Sternberg. (1997) Modelling Protein Docking using Shape Complimentarity, Electrostatics and Biochemical Information. J. Mol. Biol. 272: 106-120.

[28] Gidon Moont, Henry A. Gabb, Michael J. E. Sternberg., (1999). “Use of Pair Potentials Across Protein Interfaces in Screening Predicted Docked Complexes”, PROTEINS: Structure, Function, and Genetics 35:364–373.

[29] Richard M. Jackson, Henry A. Gabb, Michael J. E. Sternberg, (1992). “Rapid refinement of protein interfaces incorporating solvation: application to the

- 29 -

(38)

docking problem”, J. Mol. Biol. 276: 265-285.

[30] Kini RM, Evans HJ, (1996). “Prediction of potential protein-protein interaction sites from amino acid sequence identification of a fibrin polymerization site”, FEBS Lett 385:81-86.

[31] Jones S., Thornton JM., (1997a). “Analysis of protein-protein interaction sites using surface patches”, J Mol Biol 272:121-132.

[32] Jones S., Thornton JM,. (1997b). “Prediction of protein-protein interaction sites using patch analysis”, J Mol Biol 272:133-143.

[33] Gallet X., Charloteaux B., Thomas A., Brasseur R., (2000). “A fast method to predict protein interaction sites from sequences”, J Mol Biol 302:917-926.

[34] Eisenberg D., Schwarz E., Komaromy M., Wall R., (1984). “Analysis of membrane and surface protein sequences with the hydrophobic moment plot”, J Mol Biol 179:125-142.

[35] Lu L., Lu H., Skolnick J., (2003). “Development of Unified Statistical Potentials describing Protein-protein interactions”, Biophy J 84:1895–1901.

[36] Zhou H., Shan Y., (2001). “Prediction of protein interaction sites from sequence profile and residue neighbor list”, Proteins 44:336-343.

[37] Fariselli P., Pazos F., Valencia A., Casadia R., (2002). “Prediction of protein-protein interaction sites in heterocomplexes with neural networks”, Eur J Biochem 269:1356-1361.

[38] Ofran Y., Rost B., (2003b). “Predicted protein-protein interaction sites from local sequence information”, FEBS Lett 544:236-239.

[39] E. Katchalski-Katzir, I. Shariv, M. Eisenstein, A. A. Friesem, C. A. alo, and S. J.

Wodak., (1992). “Molecular surface recognition: Determination of geometric fit between proteins and their ligands by correlation techniques”, Proc. Nat. Acad.

Sci., 89:2195-2199,.

- 30 -

(39)

- 31 -

[40] H. A. Gabb, R. M. Jackson, and M. J. E. Sternberg., (1997). “Modelling protein docking using shape complimentarity, electrostatics, and biochemical information”, J. Mol. Biol., 272:106-120.

[41] G. Moont, H. A. Gabb, and M. J. E. Sternberg. (1999) Use of pair potentials across protein interfaces in screening predicted docked complexes. Proteins, 35(3):364-373.

[42] Yan, C., Dobbs, D., and Honavar, V.. (2004) A two-stage classifier for identification of protein-protein interface residues. Bioinformatics 20(S1):

i371-i378.

中 華 大 學

中 華 大 學

碩 士 論 文

預測蛋白質之間的交互作用與蛋白質四級 結構

Predict Protein-Protein Interactions and Protein Quaternary Structure

系 所 別 ： 資訊工程研究所 學號姓名 ： M09302063 劉智偉

指導教授 ： 許 文 龍 博 士

中華民國 九十六 年 八 月

預測蛋白質之間的交互作用與蛋白質四級結構 中文摘要

Predict Protein-Protein Interactions and Protein Quaternary Structure

ABSTRACT

致謝

Table of Contents

List of Figures

List of Tables

Chapert 1 Introduction

Chapter 2 Background

Chapter 3

Predict Protein Interaction Sites

Chapter 4

Predict Protein Quaternary Structure

∑∑∑

∑

∑

Chapter 5

Conclusion and Future Works

Reference

中華大學

中華大學

碩士論文

預測蛋白質之間的交互作用與蛋白質四級結構

系所別：資訊工程研究所學號姓名： M09302063 劉智偉

指導教授：許文龍博士

中華民國九十六年八月

預測蛋白質之間的交互作用與蛋白質四級結構中文摘要