• 沒有找到結果。

中 華 大 學 碩 士 論 文

N/A
N/A
Protected

Academic year: 2022

Share "中 華 大 學 碩 士 論 文"

Copied!
76
0
0

加載中.... (立即查看全文)

全文

(1)

碩 士 論 文

預測蛋白質序列間交互作用 以發展抗體療法

Predict Interactions between Protein Residues for Developing Antibody

Therapeutics

系 所 別 : 生物資訊學系碩士班 學號姓名 : M09720011 李佳烜 指導教授 : 許 文 龍 博 士

中華民國 九十九 年 八 月

 

(2)

摘 要

目前抗體療法是很熱門的一項研究,由於人類體內都擁有抗體,運用抗體來 治療疾病,所帶來的副作用遠低於化學藥物。許多疾病的治療都朝向抗體治療發 展,如癌症、自體免疫性疾病、傳染病…等。抗體屬於蛋白質的一種,預測蛋白 質序列的交互作用,對發展蛋白質抗體療法有其重要性。

本論文利用蛋白質序列及其二級結構來建造出一個類似HMM的數學模型。

首先,我們以蛋白質的交互作用位置來建立Positive資料庫,接著尋找抗體蛋白 質交互作用的hotspots,選擇hotspots範圍內,不會交互作用的位置,建立我們的 Negative資料庫。利用已知的蛋白質資料訓練模型,其中隨機選取20%的資料,

用於測試模型之準確率,其餘的80%進行訓練。再將所得的機率值拿來做為預測 時的依據。蛋白質交互作用資料則是從INTERPARE取得。將取得的資料,以我 們的方法來做實驗,準確率為79.80%。此方法還可以預測蛋白質功能位置及預測 是否會與其他蛋白質發生交互作用。

我們的方法可以預測抗原的Hotspots,再使用Protinfo PPC的方法,可輔助我 們尋找新的抗體。另一方面,抗體還可以連結gold nanorice,藉以標定目標細胞 (如:癌細胞)之影像位置,方便雷射治療。用我們的方法可以預測抗體是否會與 健康的細胞發生蛋白質間交互作用。用微生物工業量產抗體時,我們的方法可以 預測此抗體是否妨害微生物之生化系統。

(3)

ABSTRACT

Recently, protein-antibody therapeutics becomes a hot search topic. Human organisms use antibodies to defend a variety of diseases. Therefore, this therapeutics produces lower side effects than traditional chemical medicines. Many diseases are treated with antibody therapeutics, such as cancer, autoimmune diseases and infectious diseases. Antibody is a protein. Predicting interactions between protein residues is important to develop protein-antibody therapeutics.

Protein sequence and its secondary structure are used to build a HMM-like mathematical model which is trained by a protein dataset. We use protein-protein interaction sites to building our positive database. After that, we search interacting hotspots of antibody proteins and choose not interaction sites for negative database.

We take 20% data to test and 80% to train randomly. All protein interaction data files are collected from INTERPARE. The accuracy of this approach can reach to 79.80%.

This model can further be used to predict protein function sites and predict if a protein interacts with other protein.

Use our model antigen hotspots can be predicted. Then Protinfo PPC methods can help us to discover new antibodies. On the other hands, antibody can be bind with gold nanorice and interacts with antigen in target cells (cancer cells). The cancer cells can be identified in image and treated with laser. Our method can predict if the antibody also interacts with proteins of healthy cell. When use bacteria to manufacture antibodies industrially, our method predicts if the antibody interacts with proteins of this bacteria.

Keyword: Antibody Therapeutics, Predict Protein Residues Interactions, HMM.

(4)

誌謝

本論文能順利的完成首先要感謝恩師 許文龍教授悉心指導與督促。在就讀 研究所兩年間,在學術研究及學養方面,不論是研究方向的確立、研究的過程 及撰寫定稿,皆蒙許教授細心的指導與叮嚀,才能使得本論文能完成。也感謝在 學業與課程上都有教導我的各位教授們。

最後,感謝我的家人,由於你們的全力支持與經濟的支援,使我能夠專心在 研究上作努力,無後顧之憂,今日才得以完成碩士學位,僅能以此論文的完成,

來感謝你們為我所做的一切。因此再次感謝以上所有給我教導與支持的各位,有 您們的指導、教誨與支持才有今天的我。

(5)

Table of Contents

Abstract in Chinese……….i

Abstract………..ii

Acknowledgement………iii

Table of Contents……….iv

List of Figures………..vi

List of Tables………..vii

Chapter 1 Introduction……….1

1-1 Protein-Antibody Therapeutics………1

1-2 Computational Methods to Predict Interactions between Protein Residues……2

1-3 Our approach to Predict Interactions between Protein Residues……….2

1-4 Dissertation Organization………4

Chapter 2 Background………..5

2-1 Introduce Antibody………..5

2-2 INTERPARE Interacting Databases………7

2-3 Predict Protein-Protein Interaction……….11

Chapter 3 Predict Interactions between Protein Residues……….14

3-1 Setup Protein Primary and Secondary Structure Database………14

3-2 Datasets……….……….16

3-2-1 Interaction Dataset………..16

3-2-2 Non-Interaction Dataset………..18

3-3 Training Statistic Model………19

3-4 Test Positive and Negative Statistic Models………..21

3-5 Result……….23

(6)

Chapter 4 Three Applications of Our Approach………25

4-1 Antibody Discovery………...25

4-2 Nanorice Therapeutics and Industrial Manufacture………...32

Chapter 5 Conclusion and Future Work………34

Reference………...35

Appendix………..41

Appendix A………..41

Appendix B………..51

Appendix C………..61

(7)

List of Figures

Figure 1.1 Our approach………...………..3

Figure 2.1 Antibody………...5

Figure 2.2 InterPare Coverage………7

Figure 2.3 Protein Structural Interactome Map………..8

Figure 2.4 Accessible Surface Area………...9

Figure 2.5 Voronoi Diagram……….10

Figure 3.1 PDB protein primary structure files………14

Figure 3.2 PDB protein secondary structure files………15

Figure 3.3 PDB protein primary and secondary structure files in database…..……...15

Figure 3.4 The primary structure of 9 amino acid code………...17

Figure 3.5 The secondary structure of 9 encode………..17

Figure 3.6 Interaction Dataset in Positive Database……….17

Figure 3.7 Non-interaction Dataset in Negative Database………..18

Figure 3.8 Initial statistic model………..19

Figure 3.9 Initial statistic model………..19

Figure 3.10 Steps of input 1D sequences……….20

Figure 3.11 Steps of input 2D sequences……….20

Figure 3.12 Test datasets………..21

Figure 3.13 Test model……….22

Figure 4.1: Find hotspot method………..29

Figure 4.2: Protinfo PPC………..31

Figure 4.3: Antibody total score ranges vs. antibody total number……….33

Figure 4.4: Calculate interacting score……….33

(8)

List of Tables

Table 3.1 Logistic probabilities of a protein’s interaction sites………...23 Table 3.2 The Ratio………..23 Table 4.1 Antigen’s real interacting residues and predicted interacting residues…….25 Table 4.2 Comparing antigen’s real hotspots and predicted hotspots………..29

(9)

Chapter 1 Introduction

Current developing protein-antibody therapeutics is introduced in Section 1.1, and computational methods to predict interactions between protein residues are given in Section 1.2. Our approach to predict interactions between protein residues is discussed in Section 1.3. Finally, dissertation organization is described in Section 1.4.

1-1 Protein-antibody Therapeutics

Antibodies are an important defense molecule in the human body, while the immune system is like an elite force, able to block out intruders. Therefore, more and more people use antibodies to treat cancer and improve autoimmune diseases.

Since 1975, production of monoclonal antibody technology has been listed on antibody drug development, and progress very quickly [1]. After understanding the genetic structure of antibodies, antibodies are modified by genetic engineer. This technique can eliminate the human body's rejection of the antibody and create new antibodies. Antibody's target identification is strong, with low side effects and good treat effects. Monoclonal antibody drugs currently used to treat cancer, autoimmune diseases and infectious diseases, especially prominent in the efficacy of cancer treatment to stimulate the scientists on the research and development of monoclonal antibody of interest.

According to a 2008 report in [2], FDA of US had approved a total of 22 kinds of antibody drugs. At the same time, in the world, there are more than 100 antibodies had entered clinical research, more than 500 antibody drugs in preclinical research.

(10)

1-2 Computational Methods to Predict Interactions between Protein Residues

Protein-protein interaction in cells plays an important role. Almost physiological reaction within the cell, such as signal transduction, cell cycle regulation, drug design, regulation of enzyme activity and intermediary metabolism, are associated with protein-protein interactions. Because the number of experimentally determined structures of protein-protein complexes is small, computational methods for identifying amino acids that participate in protein-protein interactions are becoming increasingly important [3, 4].

There is a need to develop reliable computational methods for identifying protein-protein interaction residues [5, 6, 7]. These methods are classified as: patch analysis using a six-parameter scoring function [8], properties associated with interface topology [9], analysis of the hydrophobicity distribution around a target residue [10], charge distribution on interfaces [11], multiple sequence alignments [12, 13], structure-based multimeric threading [14], docking methods, using potentials that describe protein–protein interactions [15] and analysis of characteristics of spatial neighbors of a target residue using neural networks [16, 17, 18], and an analysis of sequence neighbors of a target residue using an support vector machine (SVM) and Bayesian classifier method [19].

(11)

1-3 Our Approach to Predict Interactions between Protein Residues

Instead of using two stages in [19], our approach uses only one HMM-like statistical model. This model can process both 1D and 2D sequences to predict each residue’s interaction. Propagation probabilities between consecutive amino acid codes are considered also. Our research developing block diagrams are shown in Figure 1.1.

The function of each block is described below:

1. The 10,109 proteins collected from INTERPARE are used to building our positive MySQL database. Choose non-interaction sites in the hotspots of 420 antibody proteins to build negative MySQL database.

2. Protein sequence and its secondary structure are used to build a mathematical model. This model is trained into positive and negative models by randomly selecting 80% positive and negative databases generated form Step 1. Then randomly selecting 20% databases for testing. The accuracy of this approach can reach to 79.2%.

3. These models can further be used to predict antigen function sites. Then use the fragments of antigen function sites to discover new antibodies. Positive model can predict if a protein interacts with other protein. That is significant for nanorice therapeutics and industrial manufacture.

(12)

Figure 1.1: Our approach.

1-4 Dissertation Organization

Antibodies, INTERPARE interacting databases and predicting methods of protein-protein interaction are described in Chapter 2. The approach to predict protein interaction is described and evaluated in Chapter 3. The applications of our approach are describes In Chapter 4. Finally, the conclusion and future work will be discussed in Chapter 5.

(13)

Chapter 2 Background

The structure and function of antibodies are introduced in Section 2.1.

INTERPARE databases that can be derived from three algorithms are used to build our statistics model. These algorithms are presented in Section 2.2. The methods of predicting protein-protein interaction are classified in Section 2.3.

2-1 Introduce Antibody

Antibodies (also known as immunoglobulins [20], abbreviated Ig) are gamma globulin proteins that are found in blood or other bodily fluids of vertebrates, and are used by the immune system to identify and neutralize foreign objects, such as bacteria and viruses. They are typically made of basic structural units each with two large heavy chains and two small light chains [21].

Figure 2.1: Antibody.

(14)

Though the general structure of all antibodies is very similar, a small region at the tip of the protein is extremely variable, allowing millions of antibodies with slightly different tip structures, or antigen binding sites, to exist. This region is known as the hypervariable region. Each of these variants can bind to a different target, known as an antigen [22]. This huge diversity of antibodies allows the immune system to recognize an equally wide variety of antigens. The unique part of the antigen recognized by an antibody is called the epitope. These epitopes bind with their antibody in a highly specific interaction, called induced fit that allows antibodies to identify and bind only their unique antigen in the midst of the millions of different molecules that make up an organism. Recognition of an antigen by an antibody tags it for attack by other parts of the immune system. Antibodies can also neutralize targets directly [23].

(15)

2-2 INTERPARE Interacting Databases

The InterPare [24] interacting databases are calculated through tree algorithms:

PSIMAP, ASA and Voronoi. In Figure 2.2, 10109 proteins which are derived through three algorithms are utilized to build our positive MySQL databases. These three algorithms are described below:

Figure 2.2: InterPare Coverage. Each number represents the number of pdb entries

(16)

1. PSIMAP:

Protein Structural Interactome map (PSIMAP) is a global interaction map that describes domain–domain and protein–protein interaction information for known Protein Data Bank structures. It calculates the Euclidean distance to determine interactions between possible pairs of structural domains in proteins [25].

Figure 2.3: Protein Structural Interactome Map (PSIMAP)

(17)

2. Accessible Surface Area:

The Accessible Surface Area (ASA) method detects protein regions that are buried to be excluded from a solvent when forming a multimer or a complex. If more than two subunits interact or aggregate with each other, they lose some area that could be accessible by a solvent in the state of free subunit or domain [26].

Figure 2.4: Accessible Surface Area (ASA)

(18)

3. Voronoi Diagram:

In mathematics, a Voronoi diagram is a special kind of decomposition of a metric space determined by distances to a specified discrete set of objects in the space, e.g., by a discrete set of points [27].

In the simplest case, we are given a set of points S in the plane, which are the Voronoi sites. Each site s has a Voronoi cell, also called a Dirichlet cell, V(s) consisting of all points closer to s than to any other site. The segments of the Voronoi diagram are all the points in the plane that are equidistant to the two nearest sites. The Voronoi nodes are the points equidistant to three (or more) sites [28].

(19)

2-3 Predict Protein-protein Interaction

The interactions between proteins are important for very numerous biological functions. For example, signals from the exterior of a cell are mediated to the inside of that cell by protein–protein interactions of the signaling molecules. This process, called signal transduction, plays a fundamental role in many biological processes and in many diseases. Proteins might interact for a long time to form part of a protein complex, a protein may be carrying another protein or a protein may interact briefly with another protein just to modify it (for example, a protein kinase will add a phosphate to a target protein). This modification of proteins can itself change protein–protein interactions. In conclusion, protein-protein interactions are of central importance for virtually every process in a living cell. Information about these interactions improves our understanding of diseases and can provide the basis for new therapeutic approaches [29, 30, 31, 32]. There are several protein prediction methods [33]:

1. Phylogenetic profiling

Phylogenetic profiling [34] finds pairs of protein families with similar patterns of presence or absence across large numbers of species. This method identifies pairs likely to act in the same biological process, but does not necessarily imply physical interaction.

2. Prediction of co-evolved protein pairs based on similar phylogenetic trees

This method [35] involves using a sequence search tool such as BLAST for finding homologues of a pair of proteins, then building multiple sequence alignments with alignment tools. From these multiple sequence alignments, phylogenetic distance

(20)

matrices are calculated for each protein in the hypothesized interacting pair. If the matrices are sufficiently similar they are deemed likely to interact.

3. Identification of homologous interacting pairs

This method [36] consists of searching whether the two sequences have homologues which form a complex in a database of known structures of complexes.

The identification of the domains is done by sequence searches against domain databases such as Pfam using BLAST. If more than one complex of Pfam domains is identified, then the query sequences are aligned using a hidden Markov tool called HMMER to the closest identified homologues, whose structures are known. Then the alignments are analyzed to check whether the contact residues of the known complex are conserved in the alignment.

4. Identification of structural patterns

This method[37][38] builds a library of known protein–protein interfaces from the PDB, where the interfaces are defined as pairs of polypeptide fragments that are below a threshold slightly larger than the Van der Waals radius of the atoms involved.

The sequences in the library are then clustered based on structural alignment and redundant sequences are eliminated. The residues that have a high (generally >50%) level of frequency for a given position are considered hotspots [39]. This library is then used to identify potential interactions between pairs of targets, providing that they have a known structure.

(21)

5. Bayesian network modeling

Bayesian methods [40] integrate data from a wide variety of sources, including both experimental results and prior computational predictions, and use these features to assess the likelihood that a particular potential protein interaction is a true positive result. These methods are useful because experimental procedures, particularly the yeast two-hybrid experiments, are extremely noisy and produce many false positives, while the previously mentioned computational methods can only provide circumstantial evidence that a particular pair of proteins might interact.

6. 3D template-based protein complex modeling

This method [41, 42, 43, 44] makes use of known protein complex structures to predict as well as structurally model interactions between query protein sequences.

The prediction process generally starts by employing a sequence based method to search for protein complex structures that are homologous to the query sequences.

These known complex structures are then used as templates to structurally model the interaction between query sequences. This method has the advantage of not only inferring protein interactions but also suggests models of how proteins interact structurally, which can provide some insights into the atomic level mechanism of that interaction. On the other hand, the ability for this method to makes a prediction is limited to a relatively small number of known protein complex structures.

(22)

Chapter 3

Predict Interactions between Protein Residues

Setup protein sequence and secondary structure database in Section 3.1, and INTERPARE interacting datasets are reconstructed as fragments with 9 amino acids and both 1D and 2D sequences of fragments are stored into MySQL database in Section 3.2. Our positive and negative models are built in Section 3.3. These models are tested in Section 3.4. The test results are given in Section 3.5.

3-1 Setup Protein Primary and Secondary Structure Database

The Protein Data Bank (PDB) archive is the single worldwide repository of information about the 3D structures of large biological molecules, including proteins and nucleic acids. These are the molecules of life that are found in all organisms including bacteria, yeast, plants, flies, humans and other spices. Understanding the shape of a molecule helps to understand how it works. This knowledge can be used to help deduce a structure's role in human disease and drug development. So we downloaded all primary and secondary structure of protein sequences from the PDB.

Figure 3.1 and Figure 3.2 show primary and secondary structure of proteins.

Figure 3.1: PDB protein primary structure files

(23)

Figure 3.2: PDB protein secondary structure files

In order to facilitate the PDB files stored in the database, we use UltraEdit to modify the file format. After that, we upload those files to our My SQL databases.

Figure 3.3: PDB protein primary structure files (sequence) and secondary structure files (secstr) in database.

(24)

3-2 Datasets

The acquisition of interaction dataset is shown in Section 3.2.1 and non- interaction dataset are collected in Section 3.2.2.

3-2-1 Interaction Dataset

After setting up protein primary and secondary structure sequence databases, we should set up interaction databases. INTERPARE [24] use PSIMAP, ASA and Voronoi algorithms to confirm protein interactions. The 10,109 proteins which are proved by these algorithms are collected from INTERPARE.

We also convert these files to the format that can be used to construct our database. Since the interaction datasets only contain primary structure; we use 1D interaction sites to find 2D’s interaction sites by using a JAVA program. Each protein has a lot of interaction positions. Interaction position is located in the center, and four amino acid codes before and after central position are captured also. So nine amino acid sequence of the primary structure and secondary structure of protein and are obtained. Nine amino acid sequence approach is same as that of C. Yan in [19].

For example, a protein PDB ID is 12e8: P. that has an interaction site is 11. The primary structure sequence is given in Figure 3.4 and secondary structure in Figure 3.5. Interaction site 11 of the amino acid code is V. Therefore, we obtain a primary structure is SGAEVVRSG and secondary structure is □□□EEEETT (□ is space).

Protein secondary structure has 8 conformational states reduced to 3 states: Helix=H, G, I, Sheet=E, B, and Coil=S, T, C. For easy to present, the Helix is represented as A, sheet is represented as B and coil is represented as C. Therefore, □□□EEEETT display as CCCBBBBCC.

(25)

Figure 3.4: The primary structure of 9 amino acid code.

Figure 3.5: The secondary structure of 9 encode.

Only 420 antibodies conform to our standards that are not enough. Therefore, we use 10109 INTERPARE proteins, which include 420 antibodies. These proteins contain 1,904,953 interaction sites in our positive MySQL database.

Figure 3.6: Interaction Dataset in Positive Database

(26)

3-2-2 Non-Interaction Dataset

Since develop antibodies are our main goal. So we chose 420 antibodies for non-interaction dataset. Non-interaction dataset chose non-interaction sites for dataset.

But, there are many non-interaction sites in a protein. Using all non-interaction sites of dataset will reduce accuracy. Therefore, we select only antibody interaction hotspots for dataset. The number of non-interaction positions is 41,623.

Hence interaction sites and non-interaction sites table which include primary and secondary sequences are build into negative MySQL databases. Randomly choose 80% of protein non-interactions datasets to train our mathematic statistic model, 20%

of protein non-interactions sites are randomly selected to test statistic model.

(27)

3-3 Training statistic model

Figure 3.8 is our initial statistic model. We will use the interaction database and non- interaction database to train positive and negative statistic models.

Figure 3.8: Initial statistic model

Figure 3.9: Initial statistic model

(28)

Two models with the 33,298 protein interaction sites datasets are trained. Every interaction fragment contains nine amino acids. When we train a fragment into our model using 1D and 2D sequences, records of all routes will add 1. If we input TWNSGSLSS and BBAAACBCC, the score will be added in Figure 3.10 and Figure 3.11. So we can calculate the probability of every route and amino acids.

Figure 3.10: Steps of input 1D sequences

(29)

3-4 Test Positive and Negative Statistic Models

After training our model, we have all the probability of every route and amino acid in a block. Randomly selected 20% protein interaction sites and non- interaction sites datasets to test our statistic models.

Figure 3.12: Test datasets in database

The test model is including interaction model and non-interaction model in Figure 3.13. Every fragment of test datasets is input to both positive and negative models to calculate resulting probabilities.

(30)

Figure 3.13: Test Model

(31)

3-5 Result

After we input test datasets to our model, we can get the logistic probabilities of a protein’s interaction sites that produce from our models. [Table 3.1]

Table 3.1: Logistic probabilities of a protein’s interaction sites

NO. 69 87 104 834 1664

Interact -28.9999 -30.5067 -28.4584 -33.5395 -31.5420 Non- Interact -27.4029 -32.4084 -28.8483 -34.7795 -30.3492

Row1 is the interaction site serial number. Row2 is the logistic probabilities that interaction test datasets produced by our interaction test model. And row3 is the logistic probabilities that interaction test datasets produced by our non-interaction test model. In order to test our statistic model’s performance of prediction, we calculate the ratio of row2 to row3. [Table 3.2]

Table 3.2: Ratio. The ratio of row2 to row3.

NO. 69 87 104 834 1664

Interact -28.9999 -30.5067 -28.4584 -33.5395 -31.5420 Non- Interact -27.4029 -32.4084 -28.8483 -34.7795 -30.3492

Ratio 1.0582 0.9413 0.9864 0.9643 1.0393 …

In row3, we can get the number of the ratio of row1 to row2. If the ratio is bigger than 1 or equal to 1, it means negative. Otherwise it means positive. Then we can calculate how many the true-positive and how many the false-positive. So we can get the result by the equation:

where

(32)

This equation shows how accurate our approach is. If interaction datasets are only antibody, then the accuracy is 65%. When datasets includes 10109 proteins, then the accuracy is 79.80%.

(33)

Chapter 4

Three Applications of Our Approach

Our models can be applied to discover new antibodies in Section 4.1. Positive model can also be applied in nanorice therapeutics and industrial manufacture in Section 4.2.

4-1 Antibody Discovery

Select 10 antigens to build antigen database. This database includes antigen’s PDB_ID, all sites (include interaction and non-interaction) of a nine amino acid sequence of primary structure and secondary structure. After that, we input the antigen dataset to our test model to test. Table 4.1 shows antigen’s real interacting residues, and its predicted interacting residues using our positive and negative models. It is noted that most of antigen’s real interacting residues appear in predicted interacting residues, and real hotspots (has heavy density of interacting resides) of antigen is also hotspots in the prediction.

Table 4.1: Antigen’s real interacting residues and predicted interacting residues PDB_ID Real interacting residues Predicted interacting residues 1AQD_G {16, 17, 19}, {27, 28, 29},

{44, 47, 51, 52, 53, 58, 61, 62, 68, 72, 74, 76, 77, 79, 80, 81, 82, 83, 84, 85, 88, 95, 96}, {138, 139, 140, 141, 142, 143}, {157, 158, 160, 168, 172, 175, 177}

{27, 28, 29}, {36, 37, 38, 39, 40, 44, 46, 47, 50, 51, 52, 53, 55, 56, 61, 62, 63, 67, 68, 71, 72, 74, 75, 76, 77, 78, 79, 80, 81, 82, 85}, {98, 99, 100, 101, 102}, {113, 115}, {126, 127, 130, 131, 133, 134, 135, 137, 138, 139, 140, 141, 142, 143}, {155, 156, 158}, {166, 168, 169, 171, 172, 173, 175, 177}

(34)

1ISH_B {10, 13}, {89, 92, 93}, {232, 235, 241, 243, 246, 247, 249}

{12, 13}, {21, 24, 25, 26, 27, 28, 29,31, 32, 33, 34, 36, 37, 40, 43, 44, 47, 48, 49, 50, 52, 54, 55, 56, 59, 60, 63, 66, 67, 68, 71}, {79, 82, 89, 90, 92, 93,110, 113}, {134, 141, 142, 145, 146, 149, 152, }, {176, 181, 183, 184, 185, 186, 188, 189, 191, 197, 199, 200, 201, 202, 203, 204, 205, 207, 208, 209, 212, 213, 215, 216, 219, 220}, {229, 232, 235, 241, 242, 243, 244, 246}

1JPF_A {62, 63, 66, 69, 70, 73, 76, 77, 80, 84}, {121, 122, 123}, {146, 150, 152, 155, 159, 163, 167, 171, 178, 179, 180, 181, 182, 183}, {229, 231, 233, 234, 235, 236, 237, 238, 239}

{15, 16, 17, 18, 19, }, {40, 41, 42, 43, 44, 45, 51, 54, 55, 56, 57, 58, 59, 62, 63, 66, 67, 69, 70, 73, 76, 77, 80, 84}, {121, 122, 128, 129, 130, 131, 132, 133, 135, 139, 140, 142, 146, 149, 150, 152, 153, 155, 158, 159, 160, 161, 162, 163, 166, 167, 170, 171, 182, 174, 175, 177, 178, 179, 180 ,182, 184, 185, 191, 192, 193, 194, 195, 196}, {223, 224, 226, 227, 228, 229, 230, 231, 233, 234, 235, 236}, {251, 252, 253, 254, 255}

2HLA_A {30, 31}, {119, 120, 121}, {180, 181, 182, 183}, {229, 231, 232, 234, 235, 236, 237, 238, 239}

{39, 41, 43, 44, 50, 53, 54, 55, 56, 57, 58, 61, 62, 65, 66, 68, 69, 72, 76, 79, 83, 87, 88, 89}, {120, 127, 128, 129, 130, 134, 138, 139, 141, 145, 148, 149, 150, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 165, 166, 167, 168, 169, 170, 171, 173, 174, 176, 177, 180, 181}, {192, 193, 194, 195, 196, 198}, {222, 223, 226, 229, 230, 231, 232, 234, 235},{250, 251, 252, 253}

2VAA_A {29, 30, 31}, {119, 120, 121, 122}, {152, 155, 159, 163}, {180, 181, 182, 183}, {190, 192}, {234, 235, 236, 237,

{39, 40, 41, 42, 43, 44, 46, 47, 50, 53, 54, 55, 56, 57, 58, 61, 62, 65, 66, 68, 69, 72, 73, 75, 76, 79, 80, 83, 84, 86, 87, 89, 94}, {124, 127, 128, 129, 130, 131, 132, 134,

(35)

176, 177, 178, 180, 181, 183, 184, 185, 190, 192, 193, 194, 195, 196, 198}, {222, 223, 224, 225, 226, 227, 228, 229, 230, 232, 233, 234, 235}, {250, 251, 252, 253, 254}

1FFP_A {119, 120}, {146, 147, 150, 151, 156, 159}, {182, 183}, {229, 231, 232, 233, 234, 235, 236, 237, 238}

{13, 14, 15, 16 ,17}, {28, 29, 30}, {38, 40, 41, 42, 43, 45, 46, 49, 52, 53, 54, 55, 56, 57, 60, 61, 64, 65, 67, 68, 71, 74, 75, 78, 79, 82}, {119, 120, 126, 127, 128, 129, 130, 131, 133, 137, 140, 144, 146, 147, 148, 149, 150, 151, 153, 154, 156, 157, 158, 159, 160, 161, 162, 164, 165, 166, 168, 169, 172, 173, 175, 176, 177, 178, 179, 180, 182, 183, 184, 189, 191}, {227, 228, 229, 230, 231, 232, 233, 234}

1I1F_A {29, 30, 31}, {66, 69}, {80, 87, 92, 94}, {119, 120, 121}, {146, 150, 152, 155, 159, 163, 167}, {180, 181, 182, 183, 188, 190, 192}, {229, 231, 232, 233, 234, 235, 236, 237, 238}

{39, 41, 43, 44, 48, 50, 53, 56, 57, 58, 61, 62, 65, 66, 68, 69, 72, 75, 79, 80, 83, 86, 87, 88, 89, 94}, {125, 127, 128, 129, 130, 131, 134, 138, 141, 145, 146, 148, 149, 150, 151, 152, 154, 155, 157, 158, 159, 161, 162, 163, 165, 166, 167, 168, 169, 170, 173, 177, 180, 181, 185, 187, 188, 189, 190, 192, 193, 195, 196, 197, 198}, {220, 221, 222, 223, 225, 226, 229, 230, 231, 232, 233, 234, 235, 236, 237}

1E28_A {62, 66, 69, 73, 76, 80}, {119, 120, 121}, {152, 155, 156, 171, 178, 179, 181, 182, 183}, {234, 235, 236, 237, 238, 239}

{50, 53, 54, 55, 56, 57, 58, 61, 62, 65, 66, 68, 69, 72, 73, 75, 76, 79, 80, 83}, {138, 139, 141, 145, 148, 149, 150, 151, 152, 153, 155, 156, 157, 158, 162, 165, 166, 169, 170, 171, 173, 174, 175, 177, 178, 180, 181}, {192, 193, 194, 195, 196, 197, 198}, {220, 221, 222, 223, 225, 226, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238}, {250, 252}

1KLU_A {13, 14, 15, 16}, {24, 25, 26, 27, 33, 36, 37, 43, 44, 47, 49, 50, 51, 52, 53, 55}, {64, 68, 69, 71, 72, 73, 75, 76,

{24, 25, 26, 33, 34, 35, 36, 37, 43, 44, 46, 47, 49, 50, 51, 52, 53, 55, 56, 59, 64, 68, 69, 71, 72, 73, 75, 76, 77, 78, 79, 82}, {110, 112, 115}, {123, 124, 127, 128, 130,

(36)

77, 79, 80, 81, 82, 83, 85, 92, 93, 94}, {108, 113, 115}, {135, 136, 137, 138, 139, 140}

131, 132, 134, 135, 136, 137, 138, 139, 140}, {152, 153, 155}, {163, 165, 166, 168, 169, 170, 172}

2MHA_C {30, 31}, {57, 58, 59, 60, 61, 62, 63, 66}, {80, 81, 84}, {119, 120, 121, 122}, {152, 155, 156, 159, 163, 167}, {190, 192}, {234, 235, 236, 237, 238}

{39, 40, 41, 42, 43, 44, 46, 47, 50, 53, 54, 55, 56, 57, 58, 61, 62, 65, 66, 68, 69, 72, 76, 79, 80, 83, 84, 86, 87, 89, 94}, {127, 128, 129, 130, 131, 132, 134, 137, 138, 139, 141, 144, 145, 148, 149, 150, 151, 152, 155, 157, 158, 159, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 180, 181, 182, 183, 184, 190, 192, 193, 194, 198}, {223, 225, 226, 227, 228, 229, 230, 231, 232, 234, 235}, {250, 251, 252, 253}

Next, Interacting surface area (ASA) is about 300 usually. The number of amino acids between adjacent interacting residues in a hotspot is less than 8, and the first and last interacting residues of hotspot are α-helix or β-sheet in 2D structure. The real interaction sites are exposed on interface. So we use NetsurfP to find the exposed and buried positions. The surface interaction positions are presented as bold characters, and non-surface interaction positions are written as regular characters. Figure 4.1 shows the method to find hotspot. The real hotspots and predicted hotspots of antigens are compared in Table 4-2. It is noted that predicted hotspots are even closer to real hotspots, if non surface interacting residues in the Table 4-1 are excluded.

(37)

Figure 4.1: Find hotspot method.

Table 4.2: Comparing antigen’s real hotspots and predicted hotspots

PDB_ID Real hotspots Predicted hotspots

1AQD_G 16~19, 27~29, 44~96, 138~143,157~160, 168~177

27~29, 36~85, 98~102, 113~115, 126~143, 155~158,

166~177

1ISH_B 10~13, 89~93, 232~249 12~13, 21~71, 79~113, 134~152, 176~220, 229~246 1JPF_A 62~84, 121~123, 146~183,

229~239

15~19, 40~84, 121~196, 223~236, 251~255

(38)

2HLA_A 30~31, 119~121, 180~192, 229~239

39~89, 120~181, 192~198, 222~235, 250~253 2VAA_A 29~31, 119~122, 152~163,

180~183, 190~192, 234~239

39~94, 124~198, 222~235, 250~254

1FFP_A 119~120, 146~159, 182~183, 229~238

13~17, 28~30, 38~82, 119~191, 227~234 1I1F_A 29~31, 66~69, 80~94,

119~121, 146~167, 180~192, 229~238

39~94, 125~198, 220~237

1E28_A 62~80, 119~121, 152~156, 171~183, 234~239

50~83, 138~181, 192~198, 220~238, 250~252 1KLU_A 13~16, 24~55, 64~94,

108~115, 135~140

24~82, 110~115, 123~140, 152~155, 163~172 2MHA_C 30~31, 57~60, 80~84,

119~122, 152~167, 190~192, 234~238

39~94, 127~198, 223~235, 250~253

After finding predicted hotspots of an antigen, web server: “Protinfo PPC” from University of Washington can be used to mine newly created possible antibodies [45].

The complete procedures of protinfo PPC is shown in Figure 4.2. These procedures are described below:

(1) The sequences of antigen’s hotspot are presented as target A and target B.

Perform local alignment between targets and antigens in antigen-antibody interacting

(39)

(2) Next, relative antibody fragments which can form new created antibody are docked with antigen, and 3D profile of antibody is generated. Many antibodies with discontinuous sequences can be created after this step.

(3) Proper loop sequences are fulfilled between discontinuous sequences, using protein loop databases. A number of different complete antibodies can be generated.

(4) Energy minimization program to find the 3D profiles of the antibodies in Step 3. Among these profiles, the profile with smallest RMSD value is wanted by comparing with the profile in Step 2.

Figure 4.2: Protinfo PPC

(40)

4-2 Nanorice Therapeutics and Industrial Manufacture

Researchers at the Georgia Institute of Technology and the University of California, San Francisco (UCSF), who had previously shown that gold nanoparticles have potential in noninvasive cancer treatment and imaging, have found an even more effective and safer way to detect and kill cancer cells. By changing the shapes of gold nanospheres into cylindrical gold nanorods, they can detect malignant tumors hidden deeper under the skin, as is the case with breast cancer, and selectively destroy them with lasers only half as powerful as before without harming the healthy cells. The method, allows for a safer and deeper penetrating noninvasive cancer treatment [46].

Research team showed that gold nanorice coated with a cancer antibody were very effective at binding to tumor cells. When bound to the gold, the cancer cells scattered light, making it very easy to identify the noncancerous cells from the malignant ones. The nanorice also absorbed the laser light more easily, so that the coated malignant cells only required half the laser energy to be killed compared to the benign cells. This makes it relatively easy to ensure that only the malignant cells are being destroyed [47].

Figure 4.3 shows the statistical information between the number of antibodies and their total score. The statistical data help us to check if antibody interacts with the protein in healthy cell in nanorice therapeutics. Figure 4.4 shows the process to calculate the total score between antibody and any detecting protein. The antibody and detecting protein are docked first. Then interacting residues of this antibody are identified and total score can be calculated using positive model. Interaction does not

(41)

lab is required to test interaction. Similarly, this technique can be used to check if antibody interacts with any protein of bacteria, when bacteria are used to manufacture antibodies industrially.

Figure 4.3: Antibody total score ranges vs. antibody total number

Figure 4.4: Calculate interacting score between antibody and any detecting protein.

(42)

Chapter 5

Conclusion and Future Works

A method to predict protein interaction sites had been developed in this dissertation. A statistical model which is Similar to Hidden Markov Model is used to predict interaction between protein residues. A large amount of biological data is operated to reach higher accuracy. Our mathematical method is trained to get probabilities of all parameters. The average accuracy of predict protein interaction between residues is above 79.80%. Furthermore, we use our models to predict hotspots of antigen for finding antibodies. Our method can also predict if the antibody interacts with proteins of healthy cell for nanorice theraputics. When use bacteria to manufacture antibodies industrially, our method predicts if the antibody interacts with proteins of this bacteria.

In order to get better results, further research works can be conducted in the future:

1. Update our antibody interaction database. Antibody interaction data is not large enough. More antibodies involved definitely will increase accuracy.

2. Design a software system like Protinfo PPC. Build antigen and antibody interacting databases. After find antigen function site, Use smith-waterman local algorithm to find possible interacting antibody fragments. Then use AutoDock to calculate 3D structure of fragments. After fragments are combined with proper loop sequence, use energy minimization in [45], the resulting antibodies with low RMSD value can be newly created antibodies.

(43)

Reference

[1] Kohler G, Milstein C: Continuous cultures of fused cells secreting antibody of predefined specificity. Nature 1975, 256(5517):495-497.

[2] Biokin http://biokin.itis.org.tw/default.aspx

[3] Valencia A., Pazos F., (2002). “Computational methods for prediction of protein interactions”, Curr Opin Struc Biol 12:368-373.

[4] Teichmann,S.A., Murzin,A.G. and Chothia,C., (2001). “Determinationof protein function, evolution and interactions by structural genomics”, Curr. Opin. Struct.

Biol., 11, 354–363.

[5] Valencia,A. and Pazos,F., (2002). “Computational methods for prediction of protein interactions”, Curr. Opin. Struct. Biol., 12,368–373.

[6] Valencia,A. and Pazos,F., (2003). “Prediction of protein–protein interactions from evolutionary information”, In Bourne,P.E. and Weissig,H. (eds), Structural Bioinformatics. Wiley Inc., pp. 411– 426.

[7] Li,S., Armstrong,C.M., Bertin,N., Ge,H., Milstein,S., Boxem,M., Vidalain,P., Han,J.J., Chesneau,A., Hao,T. et al., (2004). “A map of the interactome network of the metazoan C”, elegans. Science,303, 540–543.

[8] Jones,S. and Thornton,J.M. (1997) Prediction of protein–protein interaction sites using patch analysis. J. Mol. Biol., 272, 133–143.

[9] Valdar,W.S. and Thornton,J.M. (2001) Conservation helps to identify biologically relevant crystal contacts. J. Mol. Biol., 313, 399–416.

[10] Gallet,X., Charloteaux,B., Thomas,A. and Brasseur,R. (2000) A fast method to predict protein interaction sites from sequences. J. Mol.Biol., 302, 917–926.

(44)

[11] Sheinerman,F.B. and Honig,B. (2002) On the role of electrostatic interactions in the design of protein–protein interfaces. J. Mol. Biol., 318, 161–177.

[12] Pazos,F., Helmer-Citterich,M., Ausiello,G. and Valencia,A. (1997) Correlated mutations contain information about protein–protein interaction. J. Mol. Biol., 271, 511–523.

[13] Valencia,A. and Pazos,F. (2003) Prediction of protein–protein interactions from evolutionary information. In Bourne,P.E. and Weissig,H. (eds), Structural Bioinformatics. Wiley Inc., pp. 411– 426.

[14] Lu,L., Arakaki,A.K., Lu,H. and Skolnick,J. (2003) Multimeric threading-based prediction of protein–protein interactions on a genomic scale: application to the Saccharomyces cerevisiae proteome. Genome Res., 13, 1146–1154.

[15] Keskin,O., Bahar,I., Badretdinov,A.Y., Ptitsyn,O.B. and Jernigan,R.L. (1998) Empirical solvent-mediated potentials hold for both intra-molecular and inter-molecular inter-residue interactions. Protein Sci. 7, 2578–2586.

[16] Zhou,H. and Shan,Y. (2001) Prediction of protein interaction sites from sequence profile and residue neighbor list. Proteins, 44, 336–343.

[17] Fariselli,P., Pazos,F., Valencia,A. and Casadia,R. (2002) Prediction of protein–protein interaction sites in heterocomplexes with neural networks. Eur. J.

Biochem., 269, 1356–1361.

[18] Ofran,Y. and Rost,B. (2003) Predicted protein–protein interaction sites from local sequence information. FEBS Lett., 544, 236–239.

[19] Yan, C., Dobbs, D., and Honavar, V. (2004) A two-stage classifier for identification of protein-protein interface residues. Bioinformatics 20(S1):

i371-i378

(45)

[20] Litman GW, Rast JP, Shamblott MJ (1993). "Phylogenetic diversification of immunoglobulin genes and the antibody repertoire". Mol. Biol. Evol. 10 (1):

60–72.

[21] Eleonora Market, F. Nina Papavasiliou (2003) V(D)J Recombination and the Evolution of the Adaptive Immune System PLoS Biology1(1): e16.

[22] Janeway CA, Jr et al. (2001). Immunobiology. (5th ed.). Garland Publishing.

[23] Rhoades RA, Pflanzer RG (2002). Human Physiology (4th ed.). Thomson Learning. ISBN 0-534-42174-1.

[24] INTERPARE http://www.interpare.net

[25] Sungsam Gong, Giseok Yoon, Insoo Jang, Dan Bolser, Panos Dafas, Michael Schroeder, Hansol Choi, Yoobok Cho, Kyungsook Han7, Sunghoon Lee, Hwanho Choi, Michael Lappe, Liisa Holm, Sangsoo Kim, Donghoon Oh and Jonghwa Bhak “PSIbase: a database of Protein Structural Interactome map (PSIMAP)”, BIOINFORMATICS, Vol. 21 no. 10 2005, pages 2541–2543.

[26] Momen-Roknabadi A, et al. (2008). Impact of residue accessible surface area on the prediction of protein secondary structures. BMC Bioinformatics. 9:357.

PMID 18759992.

[27] Mark de Berg, Marc van Kreveld, Mark Overmars, and Otfried Schwarzkopf (2000). Computational Geometry (2nd revised edition ed.). Springer-Verlag.

ISBN 3-540-65620-0. Chapter 7: Voronoi Diagrams: pp. 147–163. Includes a description of Fortune's algorithm.

[28] Rolf Klein (1989). Concrete and Abstract Voronoi Diagrams. Lecture Notes in Computer Science. 200. Springer-Verlag. ISBN 3540520554.

[29] Kinetic Linked-Function Analysis of the Multiligand Interactions on Mg Activated Yeast Pyruvate Kinase. Thomas J. Bollenbach and Thomas Nowak., Biochemistry, 2001, 40 (43), pp. 13097–13106.

(46)

[30] Teichmann,S.A., Murzin,A.G. and Chothia,C., (2001). “Determinationof protein function, evolution and interactions by structural genomics”, Curr. Opin. Struct.

Biol., 11, 354–363.

[31] Valencia,A. and Pazos,F., (2002). “Computational methods for prediction of protein interactions”, Curr. Opin. Struct. Biol., 12,368–373.

[32] Valencia,A. and Pazos,F., (2003). “Prediction of protein–protein interactions from evolutionary information”, In Bourne,P.E. and Weissig,H. (eds), Structural Bioinformatics. Wiley Inc., pp. 411– 426.

[33] Wikipedia

http://en.wikipedia.org/wiki/Protein%E2%80%93protein_interaction_prediction [34] Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO. (1999)

"Assigning protein functions by comparative genome analysis: protein phylogenetic profiles." Proc Natl Acad Sci U S A., 96, 4285-8.

[35] Aloy P.,Russell R.B. "InterPreTS: Protein Interaction Prediction through Tertiary Structure." Bioinformatics, 19 (1), 161-162.

[36] Tan S.H., Zhang Z., Ng S.K. (2004) "ADVICE: Automated Detection and Validation of Interaction by Co-Evolution." Nucl. Ac. Res., 32 (Web Server issue):W69-72

[37] Aytuna A. S., Keskin O., Gursoy A. (2005) "Prediction of protein-protein interactions by combining structure and sequence conservation in protein interfaces." Bioinformatics, 21 (12), 2850-2855.

[38] Ogmen U., Keskin O., Aytuna A.S., Nussinov R. and Gursoy A. (2005) "PRISM:

protein interactions by structural matching." Nucl. Ac. Res.,33 (Web Server issue):W331-336.

(47)

[39] Keskin O., Ma B. and Nussinov R. (2004) "Hot regions int protein-protein interactions: The organization and contribution of structurally conserved hot spot residues" J. Mol. Biol., (345),1281-1294.

[40] Jansen R, Yu H, Greenbaum D, Kluger Y, Krogan NJ, Chung S, Emili A, Snyder M, Greenblatt JF, Gerstein M. (2003) A Bayesian networks approach for predicting protein-protein interactions from genomic data." Science, 302(5644):449-53.

[41] Aloy P., and R. B. Russell. (2003) "InterPreTS: protein Interaction Prediction through Tertiary Structure". Bioinformatics, 19 (1), 161-162.

[42] Chen YC, YS Lo, WC Hsu, and JM Yang. (2007). "3D-partner: a web server to infer interacting partners and binding models". Nucleic Acids Research, 35 (Web Server issue): 561-7.

[43] Fukuhara, Naoshi, and Takeshi Kawabata. (2008) "HOMCOS: a server to predict interacting protein pairs and interacting sites by homology modeling of complex structures" Nucleic Acids Research, 36 (S2): 185-.

[44] Kittichotirat W, M Guerquin, RE Bumgarner, and R Samudrala (2009) "Protinfo PPC: a web server for atomic level prediction of protein complexes" Nucleic Acids Research, 37 (Web Server issue): 519-25.

[45] Weerayuth Kittichotirat, Michal Guerquin, Roger E. Bumgarner and Ram Samudrala Protinfo (2009) PPC: A web server for atomic level prediction of protein complexe. Nucleic Acids Research, 2009, Vol. 37, Web Server issue W519–W525.

[46] Xiaohua Huang, Ivan H. El-Sayed, Wei Qian. and Mostafa A. El-Sayed. Cancer Cell Imaging and Photothermal Therapy in the Near-Infrared Region by Using Gold Nanorods. J. Am. Chem. Soc., 2006, 128 (6), pp 2115–2120.

(48)

[47] Hui Wang, Daniel W. Brandl, Fei Le, Peter Nordlander, and Naomi J.

HalasNanorice: A Hybrid Plasmonic Nanostructure.Nano Lett., 2006, 6 (4), pp 827–832.

(49)

Appendix

Appendix A

There are our test model’s parameters of positive. The parameter in are the probability of start in our model that are start to α-helix, β-sheet and Coil. The parameters G are probability of amino acid Glycine in our test model. The first { } are probability of nine block in α-helix. Second { } are probability of nine block in β-sheet. Third { } are probability of nine block in Coil. AA is every line of probability of α-helix to α-helix in our Model. AB is α-helix to β-sheet. AC is α-helix to Coil.

in

{-1.0777484790871408, -1.41793359126078, -0.8736504077353311}

G

{-3.1472099796503588, -3.176290472458474, -3.132314282148041, -3.1271823434115507, -3.2674273229628823, -3.1966431861736067, -3.191360861582931, -3.1405311497865993, -3.1704568492847907}

{-3.0265814305739434, -2.9168862794395998, -2.893277584798866, -2.812688108111485, -3.0427160035491525, -2.841998173611949, -3.0399703917915564, -2.940463884492605, -2.983449354200421}

{-2.096009620981201, -2.064385422239733, -2.05657944445429, -2.082847989410555, -2.119986681270305, -2.055625172856532, -2.0552605041272054, -2.086381997532866, -2.0962277155624154}

A

{-2.1455519319871406, -2.1477286968354514, -2.171977549526432, -2.1521312750481134, -2.2412306130592263, -2.137561736262132, -2.1686246078935123, -2.172829020263138, -2.144980591112694}

(50)

{-2.768752321271844, -2.724105591855508, -2.7452681729936903, -2.631495856914682, -2.919441111091819, -2.7316501164430833, -2.783850572235712, -2.800046705662858, -2.7003230982845006}

{-2.7255797122714434, -2.7423509786322, -2.8078202060431514, -2.781021709456435, -2.845962132143586, -2.7047217187282295, -2.7660877032233233, -2.7628381431244042, -2.7573675086213933}

V

{-2.735603603520112, -2.7060226075129585, -2.679960426837384, -2.748114707154411, -2.8152351384467487, -2.74235435628598, -2.7700169248514097, -2.733754566080624, -2.835982615326251}

{-2.0161437154904993, -2.0563256251739856, -2.0267775174277265, -2.0414101001857605, -2.218591710169781, -2.0910314221631774, -2.042659217843056, -2.086369360488356, -2.0656013938776594}

{-3.109090041732639, -3.091401997460273, -3.084337581194451, -3.063862960688206, -3.113584534920831, -3.042440838304006, -3.0897108485780294, -3.030926521675784, -3.1158441072936385}

L

-2.1942289803824573, -2.1729883635946896, -2.1650142718742873, -2.1551592864457545, -2.1779684162812596, -2.210542439668368, -2.126002432869109, -2.1803365630328737, -2.196386916037448}

{-2.3019653222893606, -2.357270491504177, -2.404113457814055, -2.393027767244314, -2.3821643012386584, -2.2384631517416906, -2.409436803959598, -2.3630603632449154, -2.328692063192118}

{-2.773316047895526, -2.8371032783662673, -2.8242140158188276,

(51)

I

{-2.8724446458122785, -2.878616205378244, -2.854313813982162, -2.8369983932474456, -2.9077954793376133, -2.7918830481334123, -2.7949835475818707, -2.834320999773776, -2.9224746466796945}

{-2.332156294568506, -2.430659814686977, -2.3982910250626217, -2.4653113375237004, -2.48852933732249, -2.399145533432164, -2.4359387447051044, -2.3701375336190007, -2.379914332330163}

{-3.4287660178597172, -3.387558696903026, -3.276199139561885, -3.278498970398999, -3.373615076042102, -3.3286683650447926, -3.3851269116295466, -3.2843754224853225, -3.3723240788754403}

F

{-3.24416422065525, -3.2486111340381005, -3.1716161344956544, -3.259855244067479, -3.1636626129554637, -3.141130476243348, -3.180976552277215, -3.184803973464587, -3.262379559475451}

{-2.917646747210275, -2.855712056070446, -2.9222651216721185, -2.8707764456318645, -2.768979111212414, -2.8636596703931283, -2.848538246927143, -2.8701318765484185, -2.8521932754130335}

{-3.5202717844371865, -3.387558696903026, -3.3613569479021916, -3.3710505277642437, -3.4339872044851463, -3.384459724673208, -3.5225107315997377, -3.4201070360194477, -3.552952155163062}

Y

{-3.3315870356264092, -3.309519887125093, -3.3514092449362507, -3.368526599372607, -3.2449038482957993, -3.332290981704507, -3.3347829387631482, -3.335531164840325, -3.3491920304475182}

(52)

{-3.0265814305739434, -2.947730954790698, -2.8814431271518632, -2.897444692714026, -2.8143383606113446, -2.9368322762827876, -2.9024728861620903, -2.9480301248759213, -2.9731133448697586}

{-3.503218996054467, -3.509606131345867, -3.4700901480386572, -3.599257381035, -3.371596911885865, -3.4880004036140484, -3.5060203697003223, -3.566942394270333, -3.505855471499963}

W

{-4.136977071467123, -4.115030035872082, -4.128816769113936, -4.066788880599322, -3.90950902551626, -4.168352768824785, -4.148125701384861, -4.217721002867302, -3.997991220553155}

{-4.060815057428617, -3.919763669673088, -4.043183167854527, -4.062070791698029, -3.762452638298337, -3.9334929945111945, -3.823814133822314, -3.9162805956807873, -3.812728709084946}

{-4.692803062928303, -4.598648968997826, -4.484510345486419, -4.538655687919305, -4.442374401560531, -4.37627809926053, -4.489969750048292, -4.625549348324744, -4.587660641851691}

H

{-3.744487122222092, -3.7369639019520324, -3.803394368679308, -3.7907123542678494, -3.7243056223474165, -3.798910512966002, -3.7035792432342256, -3.704585909252649, -3.7310711395498033}

{-3.6529547385944983, -3.628901944303928, -3.5769460214042677, -3.653984450109991, -3.5191542850056674, -3.740589328386703, -3.6393850946887945, -3.6488012305465256, -3.8067939735651315}

{-3.6402070115469005, -3.6458885573422792, -3.602335366852436,

(53)

D

{-2.9915936686066176, -2.967198674599915, -3.071331771869689, -3.149473033309922, -3.031158441787471, -3.0278017909363446, -3.097760246256137, -2.979095483972479, -2.9365925282254794}

{-3.527228849424178, -3.573332093149117, -3.2602090289047876, -3.56392362619181, -3.5381121987502815, -3.502355531348903, -3.461459149367878, -3.5108154875726227, -3.4856542349225044}

{-2.5763563252549133, -2.5991013365387556, -2.600834364869476, -2.6086411517189507, -2.5906825916098013, -2.61537661839238, -2.641514937143692, -2.601861498226746, -2.5949058545324375}

N

{-3.354059891478468, -3.2832966920259903, -3.419320450552461, -3.3019175192409316, -3.3236482158073555, -3.3469253892229447, -3.3741000421039833, -3.4766097631002304, -3.3420576363336436}

{-3.66257019729394, -3.6053714468937335, -3.6402274713517877, -3.6179908474620857, -3.602331791484575, -3.7464544478391013, -3.603205438111292, -3.6642257008721573, -3.771906714564691}

{-2.8432240228114223, -2.820071236471835, -2.825395350864659, -2.82919616175956, -2.758156473936662, -2.7982681381541723, -2.78408593934594, -2.8313418974580484, -2.7735341424767403}

E

{-2.4885309631475687, -2.5577392673152906, -2.543851713198561, -2.5132432156064755, -2.5007418085443107, -2.56227244208553, -2.5865544666216636, -2.5051215224816428, -0.48591311589854097}

(54)

{-3.103644778733538, -3.1678669850409524, -3.177790082489368, -3.2058143047659344, -3.105894905170684, -3.1851593983839583, -3.2062365578322134, -3.1461723739847134, -0.625110339102301}

{-2.9143593696760206, -2.9162261638072366, -2.9498495253380645, -2.938350259048477, -2.916318098065481, -2.9844740823296694, -2.9265757761155986, -2.9022576783741494, -0.5945049416605906}

Q

{-3.1700562485015285, -3.034600066287163, -3.1023420086122493, -3.0898131969035862, -3.0980979244625804, -3.1019097630900667, -3.152461381969194, -3.126845696701345, -3.123664687778032}

{-3.477836094094602, -3.495370551679405, -3.5174318938718603, -3.4297943262251103, -3.3629935542503846, -3.422667370745204, -3.5633991877108726, -3.542420826987954, -3.468632547353074}

{-3.4888302586023676, -3.387558696903026, -3.4346882209877414, -3.4335708847455777, -3.335949573155627, -3.3601670321041635, -3.3955328327906633, -3.3417601062662747, -3.349286410808423}

K

{-2.8237287730882206, -2.889666041564829, -2.7975821851770726, -2.8340043790348406, -2.795148379880011, -2.7932944809718196, -2.7879856183942264, -2.834320999773776, -2.9271584959921206}

{-3.031709646940863, -3.1324650579900366, -3.1714809132961035, -3.14735610427934, -3.0027106689354532, -3.1456805874101708, -3.0284760123658216, -3.0862742324036447, -3.1467325175909515}

{-2.915700751500222, -2.9892110565637253, -2.983889571578056,

(55)

R

{-2.814852975174252, -2.7687134311471664, -2.848188220555479, -2.819167311604373, -2.624495011187198, -2.84548023414239, -2.879954107536648, -2.784840942510406, -2.814780407675648}

{-3.1931706151822463, -3.233006287211914, -3.1497071137465427, -3.2226781108179394, -2.8569988577540597, -3.0681223530642963, -3.175982149474411, -3.074713410002569, -3.113646793005186}

{-3.11072003394357, -3.045025835670122, -3.04088505262742, -3.050578632489472, -2.960769860636315, -3.081021290674669, -3.018358154521595, -3.0861552796235694, -3.0131099884180315}

S

{-2.9951147990051963, -3.003884011056716, -2.9128285280435886, -2.924691479991474, -3.1038838615296243, -2.962147820714908, -2.9812454415877485, -2.960837715398914, -2.9851457536424704}

{-2.836193602067376, -2.921570128752026, -2.9173751363779266, -2.912296450850052, -2.849761190453829, -2.873439699446768, -2.8087669235002464, -2.804423080262657, -3.0097666625177943}

{-2.615347619220097, -2.5923543229921613, -2.6452432062221987, -2.6431603987604255, -2.714369447157838, -2.6902884784835726, -2.6172064966671504, -2.6486129257063293, -2.6458404780479032}

T

{-3.029198815784933, -3.093162595495242, -3.0524276172305362, -3.097565173707904, -3.077164714796115, -3.049463287717524, -3.0714923194355266, -3.0665637949431, -3.0292206838615563}

(56)

{-2.6219692009822966, -2.625123758306657, -2.607382822921181, -2.639057329615259, -2.6246964089617255, -2.7488548418927143, -2.574674357696366, -2.583035455650712, -2.513445724954685}

{-2.6832050946157318, -2.7268467920962345, -2.7657446421686824, -2.725482124187433, -2.8005262449756754, -2.771269872918017, -2.7506006680221606, -2.797684874454572, -2.720263758418016}

M

{-3.554730586940619, -3.595592948771173, -3.5707720734105544, -3.593585248622916, -3.54378424605761, -3.578086251442883, -3.3969147198701544, -3.5933602741424244, -3.46548453072584}

{-3.6481814598418407, -3.6147173093119713, -3.6970903332463307, -3.6230541494186324, -3.7928501154827083, -3.6352288127288768, -3.6660533417709558, -3.674642461730413, -3.7381956572223793}

{-3.8153521643704122, -4.034959855674767, -3.9975702067097854, -4.00345425815517, -4.002223735464476, -4.025699045631136, -3.98164725650042, -4.045142790720247, -3.9533539613146793}

C

{-4.445124974511194, -4.3725412306905, -4.3155436193758065, -4.495601352704478, -4.492114331676381, -4.219940137615278, -4.428587525863155, -4.473172797414169, -4.445420527760351}

{-3.998076716795194, -3.7030099164576495, -4.020877410340228, -3.824853513172158, -4.028416186795475, -4.053939147587061, -3.9811343884108514, -4.051718566575891, -4.053890765901834}

{-4.429218542055583, -4.325355633998145, -4.318890638310396,

(57)

P

{-3.7042623209115826, -3.8505889274597522, -3.835655230897529, -3.6227607377568476, -3.6381939305286846, -3.8066624897703196, -3.639476357041466, -3.6976893301935885, -3.660215202782459}

{-4.075308064731185, -3.697894815790879, -3.8772864907279723, -3.831084062922794, -3.681949419449267, -3.8783065789439037, -3.7273538676347515, -3.84551152479258, -3.7002507256582557}

{-2.4806985611578987, -2.525477040331585, -2.4951906032106055, -2.4479145909855355, -2.483937992686037, -2.4888736661607442, -2.4246905221809607, -2.4185685960431593, -2.4791805441919363}

AA

{-0.11054072845386553, -0.11706635967900694, -0.11332514642978274, -0.11092076257300289, -0.1166986485540919, -0.11973045594077908, -0.1108016260987283, -0.10249766133782619}

AB

{-5.346490056168674, -5.328140917500478, -5.46162244528284, -5.728870526518845, -5.480725725130163, -5.12163207006828, -5.4919462813120505, -5.24387821231999}

AC

{-2.3037328439698204, -2.247900397174975, -2.274065303672067, -2.2853387810301284, -2.244501261927349, -2.2361029366479035, -2.2949752918318995, -2.3844461379586543}

BA

{-4.836405907199637, -4.422132583420855, -4.369184104608444, -4.54954905146703, -4.437795194224776, -4.463412277092765, -4.420501048194697, -4.438279520044903}

(58)

BB

{-0.2621782929206016, -0.2877242087627532, -0.2967798696024034, -0.2829044909119928, -0.2748761486814421, -0.27357500335094714, -0.2528561743168481, -0.25958003993108614}

BC

{-1.5019717416971643, -1.435387946364244, -1.4100509835027826, -1.4446174859480463, -1.4761673326778244, -1.479191793799272, -1.5540399609202185, -1.528741796806578}

CA

{-2.5167040880021165, -2.5232766610609687, -2.4907521779154362, -2.448765055398181, -2.4746753348370905, -2.407647399054195, -2.4635584113721913, -2.421491994868105}

CB

{-2.129830220003835, -2.1765893157505847, -2.140378452053433, -2.2110780925162765, -2.153560061411043, -2.1543810882848766, -2.04379771254138, -2.090196646426426}

CC

{-0.2226220670503171, -0.21520531415033786, -0.2237161548539694, -0.2181344489957895, -0.22346925908864798, -0.23067407154660857, -0.24164784745068255, -0.2388307817197822}

(59)

Appendix B

There are our test model’s parameters of negative. The parameter nin are the probability of start in our model that are start to α-helix, β-sheet and Coil. The parameters NG are probability of amino acid Glycine in our test model. The first { } are probability of nine block in α-helix. Second { } are probability of nine block in β-sheet. Third { } are probability of nine block in Coil. NAA is every line of probability of α-helix to α-helix in our Model. NAB is α-helix to β-sheet. NAC is α-helix to Coil.

in {-2.1343466431956153, -0.8006154179037511, -0.8378825800382865}

NG

{-3.7680764957751296, -3.8365632757841213, -3.7475413667625226, -3.855301470932434, -3.4678994023300698, -3.477004589568837, -3.7213185843154126, -3.9077280811452653, -3.7292307496375035}

{-3.6490878839653975, -3.3375011327273745, -3.5036691134140905, -3.2416976747520474, -3.120620134834119, -3.245611981359237, -3.4764152394471757, -3.2147185887130454, -3.1174141988127175}

{-1.892917618914803, -1.9593774214036348, -1.879944679494974, -1.8998086748098095, -1.8355803103900716, -1.9991771782368006, -1.8320666253594147, -1.8503667830759811, -1.8438220272972863}

NA

{-2.7029868991456634, -2.725829626583667, -2.6176765345903084, -2.6231577896398015, -2.5899584758307808, -2.4349045111640697, -2.6621202641881796, -2.387902327400852, -2.515016451259371}

{-2.731255923941309, -3.3264104460332162, -2.774841337834309,

(60)

-2.9150395075435083, -2.711596304108961, -2.9771755066226895, -3.063937032783139, -3.19479455799314, -3.0532359731213066}

{-2.512351902061136, -2.4448234496725725, -2.5529630926243057, -2.436609784979061, -2.339732887468214, -2.6837357002189006, -2.6400141618378195, -2.5626639587729825, -2.6092043664655775}

NV

{-3.644207014062344, -3.259675901340038, -3.2085448660298352, -3.518829234311221, -3.0267741138230213, -3.2003723533036412, -3.119143181961194, -3.4112911948313744, -2.854988254006848}

{-2.2231032739062084, -2.1072955116560808, -2.349640985604412, -2.23807765140458, -2.0610859669905164, -2.2205062967142695, -2.0786703715392894, -2.1150748404382735, -2.096685566961451}

{-3.578947976830287, -3.4448450744571955, -3.5732063141769084, -3.7347968785886847, -3.5303354572309784, -3.646112160766017, -3.585065045887237, -3.762448901758444, -3.248671397561899}

NL

{-2.2198128067101828, -2.18649850981975, -2.6100136618447394, -2.007972150549568, -1.8369131536897667, -2.3008887397458673, -2.5527477070403926, -2.0249968337114836, -2.282311766701178}

{-1.9846361006535087, -2.1700612072058685, -2.2333565265809434, -2.0809344506408682, -2.0052155823865756, -2.4010409186762898, -2.0024299498154914, -2.2067832669277725, -2.1055513355937743}

{-3.210649308597397, -3.2746049386374447, -3.4570997248477773, -3.439612895485109, -3.326119915802287, -3.364418130158859,

(61)

{-3.59680477516776, -3.4896923319420097, -3.4870102833766925, -3.1270629705612185, -3.155766234014125, -3.0250194658257796, -3.0137826663033676, -3.0502778492940434, -2.999978564435442}

{-2.8047322838839177, -2.7367872333237955, -3.3012516926169977, -3.1191565016492473, -2.802652666793486, -3.218133752168109, -2.7813505127579585, -3.0911046327458687, -3.1260973213861782}

{-3.95138255926197, -4.484117923338996, -4.56922573268261, -4.277241658600288, -4.197774281080717, -4.324444255540822, -4.259814861307467, -4.156044434311225, -3.977283287694127}

NF

{-3.7250591106914386, -3.8244419152517763, -3.7475413667625226, -3.649449416728285, -3.561231342122289, -3.5335167998321793, -3.466788719180532, -3.4112911948313744, -3.3043475556722375}

{-2.903362887325545, -3.074279357845754, -3.2256489518740468, -3.1578710138299377, -3.465146106262429, -2.911026735548497, -3.221906940677393, -2.8343331650019046, -3.178309030043091}

{-4.209424050427457, -3.9157224477515276, -3.721927872295406, -3.9831037858056355, -4.26062218474038, -3.753132427842208, -3.968234148440028, -4.023619848661893, -3.966511190712216}

NY

{-4.20139855838626, -3.515894704336034, -3.771638918341583, -3.929409443086156, -3.676300671907076, -3.828981012726015, -3.3403045720714126, -3.1881476435171647, -3.8543938925915096}

{-2.3361108248651337, -2.4234433753575315, -2.3523400425735765, -2.4487699053277505, -2.5496103763116427, -2.101146388195005, -2.381007054769849, -2.321604591089828, -2.447591690819181}

(62)

{-3.1440689835158295, -3.265438971623365, -3.3350998901859863, -3.2465125686586918, -3.6940806284519976, -3.405803346321713, -3.3603312477496923, -3.2924452725127082, -3.3571338932177293}

NW

{-4.517735886598417, -4.63236461040415, -4.246532532881511, -4.520277774525683, -4.586512357704849, -4.4615035714695255, -3.6927452118713564, -3.805945386835323, -4.142075965043291}

{-2.7139526932538556, -3.693114505441148, -2.669274657862795, -3.1920750916008016, -3.188372370574786, -3.3441531817450083, -3.3352356259843963, -3.917197096701875, -3.6709247424296585}

{-4.47553363762511, -4.182642528754879, -4.647003433725294, -4.598289424895869, -4.8758078238306135, -4.664452146994343, -4.20410425429346, -4.205941405455848, -4.359553778821823}

NH

{-4.695417063835869, -4.947445657044045, -4.712622262806109, -4.968302497052643, -4.419458273041683, -4.655659585910484, -4.521437884427526, -4.349560833424304, -4.45223089334713}

{-3.9088847529276425, -3.733715481063388, -4.196816293974036, -3.9206378286676435, -4.501581149978647, -4.265310103270336, -4.522679128359122, -4.648565490082018, -4.959405793527106}

{-4.141678061895068, -4.515468453223073, -4.654383541022916, -4.649582719283419, -4.532218119440537, -3.8918981888299866, -4.199190239491031, -4.004425401405745, -4.4028037626156395}

ND

參考文獻

相關文件

Know how to implement the data structure using computer programs... What are we

• Recorded video will be available on NTU COOL after the class..

In the work of Qian and Sejnowski a window of 13 secondary structure predictions is used as input to a fully connected structure-structure network with 40 hidden units.. Thus,

The remaining positions contain //the rest of the original array elements //the rest of the original array elements.

• If we want analysis with amortized costs to show that in the worst cast the average cost per operation is small, the total amortized cost of a sequence of operations must be

Thus, this study argued the role and function of KIBS through evolution of local innovation system first; and then, this study analyzed interaction between technical firms and

The advantages of using photonic crystal fibers (PCFs) as gas sensors include large overlap and long optical path interaction between the gas and light mode field and only a

Based on different characteristics of known protein-protein interaction sites, several methods have been proposed for predicting interface residues using a combination of