An efficient mechanism for prediction of protein-ligand interactions based on analysis of protein tertiary substructures

(1)

An Efficient Mechanism for Prediction of Protein-Ligand Interactions

Based on Analysis of Protein Tertiary Substructures

Darby Tien-Hau Chang, Chien-Yu Chen, Yen-Jen Oyang

Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan, R.O.C.

_{Corresponding author. Email: [email protected]; [email protected] Tel: 2-23625336 #431 Fax:}

+886-23688675. This research is sponsored by National Science Council of R.O.C. under contract NSC 92-2323-B-002-013.

Hsueh-Fen Juan

Institute of Biotechnology and Department of Chemical Engineering

National Taipei University of Technology, Taipei, Taiwan

,

R.O.C.

Hsuan-Cheng Huang

Institute of Biological Chemistry Academic Sinica, Taipei, Taiwan, R.O.C.

Abstract

Analysis of protein-ligand interactions is a fundamen-tal issue in drug design. As the detailed and accurate analysis of protein-ligand interactions involves calcu-lation of binding free energy based on thermodynamics and even quantum mechanics, which is highly expen-sive in terms of computing time, conformational and structural analysis of proteins and ligands has been widely employed as a screening process in computer-aided drug design. In this paper, an efficient mecha-nism for identifying possible protein-ligand interac-tions based on analysis of protein tertiary substruc-tures is proposed. In one experiment reported in this papert, the proposed prediction mechanism has been exploited to obtain some clues about a hypothesis that the biochemists have been speculating. The main distinction in the design of the prediction mechanism is the filtering process incorporated to expedite the analysis. The filtering process extracts the residues located in a cave of the protein tertiary structure for analysis and operates with O(nlogn) time complexity, where n is the number of residues in the protein. In comparison, the Dhull algorithm, which is a widely used algorithm in computer graphics for identifying those instances that are on the contour of a 3-dimensional object, features O(n2) time complexity. Experimental results show that the filtering process presented in this paper is able to speed up the analysis by a factor ranging from 3.11 to 9.79 times.

Keywords: protein structural analysis, protein tertiary structure, kernel density estimation.

1. Introduction

One of the fundamental issues in drug design is analy-sis of protein-ligand interactions [9, 11]. The detailed and accurate analysis of protein-ligand interactions involves calculation of binding free energy based on thermodynamics and even quantum mechanism [2, 5]. However, this approach is highly expensive in terms of computing time. As a result, conformational and struc-tural analysis of proteins and ligands has been widely employed as a screening process in computer-aided drug design [6, 14, 15, 16].

In this paper, an efficient mechanism for identifying possible protein-ligand interactions based on analysis of protein tertiary substructures is proposed. Fig. 1 illustrates one application that the proposed prediction mechanism addresses. In this application, the bio-chemist is given the crystal structure of a protein bound with a specific ligand and wants to conduct a search in the PDB database [4] for the other proteins that could interact with the specific ligand. In one experiment presented in this paper, the proposed prediction mechanism has been exploited to investigate whether some proteins in the caspase family contains a similar binding site as the structure of integrin reported in [18]. The experimental results are in conformity with a hypothesis that the biochemists have been speculating. Concerning the application illustrated in Fig. 1, it is apparent that only the substructures in a cave of the protein tertiary structure are of interest. Therefore, in order to expedite the analysis process, it is desirable to incorporate a mechanism that can effectively extract the residues in a cave of the protein tertiary structure. In this paper, an efficient filtering process with O(nlogn) time complexity is employed, where n is the

(2)

number of residues in the protein. In comparison with theDhull algorithm [7], which is a widely used algo-rithm in computer graphics for identifying those in-stances on the contour of a 3-dimensional object, the filtering process employed in this paper features a lower time complexity, O(nlogn) versus O(n2_). Ex-perimental results show that the filtering process pre-sented in this paper is able to speed up the analysis by a factor ranging from 3.11 to 9.79 times.

In the following part of this paper, section 2 elabo-rates the proposed prediction mechanism. Section 3 reports the experiments conducted to evaluate the ef-fects of the proposed prediction mechanism. Finally, concluding remarks are presented in section 4.

Fig. 1. One application that the proposed prediction mechanism addresses.

2. The prediction mechanism

The prediction mechanism that we have developed for the problem illustrated in Fig. 1 carries out analysis in two steps. In the first step, a filtering process based on an efficient kernel density estimation algorithm is in-voked to identify the crucial tertiary substructures on

the contour of the protein that the analysis should focus on. In the second step, the geometric hashing algo-rithm in computer graphics [8, 10] is invoked to com-pare the crucial substructures of the target protein and the binding/active site of the reference protein. In this paper, we refer to the protein that contains the bind-ing/active site of interest the reference protein and the proteins in PDB with which the alignment is to be con-ducted the target proteins.

The efficient kernel density estimation algorithm that forms the basis of the filtering process treats a given compact set of instances {s1, s2,…, sn} in the vector space as n samples randomly taken from a prob-ability distribution with an unknown form and employs the learning algorithm that we have recently proposed [12, 13] to construct an approximate probability den-sity function of the following form:

) ( ˆ v f , 2 || || exp 1 1 2 2

¦

¸¸ ¹ · ¨ ¨ © § ¸¸ ¹ · ¨¨ © § n i i m i n O V _V E v si ₍₁₎ where

(i) _{v is a vector in an m-dimensional vector space,} (ii) Ӫ is the parameter that controls the smoothness

of the approximation function, (iii) m m i i k R ) 1 ( ) 1 ( ) ( 2 * S E EG V si _and ¸ ¸ ¹ · ¨ ¨ © §

¦

k h k m m R 1 || ˆ || 1 1 ) (s_i s_h s_i ,

where

_s

ˆ

₁

,

_s

ˆ

₂

,

...,

_s

ˆ

_k are the k nearest neighbors of

si and k is a parameter to be set by the user.

(iii)

¦

f f ¸ ¸ ¹ · ¨ ¨ © § h h 2 2 2 exp E O .

One interesting observation is that, regardless of which

i i

G V

E ratio is employed, we have S E O

2

# . If this observation can be proved to be generally correct, then we can further simplify equation (1) and obtain

¦

_¸¸ ¹ · ¨ ¨ © § ¸ ¸ ¹ · ¨ ¨ © § n i i m i n f 1 2 2 2 || || exp 2 1 1 ) ( ˆ V V S i s v v . (2)

As the approximate probability density function presented in equations (1) and (2) is a continuous and smooth function in the vector space, we can expect that the function values at the instances located on the boundary of the set of the instances are generally smaller than the function values at the rest of instances. Accordingly, we can set a threshold of the function values to distinguish those instances that are located on the boundary from those that are not. Fig. 2 depicts a

Given the co-crystal structure of a protein bound with a ligand

Protein A Ligand

Extract substructure

from protein A

Search in PDB _PDB

(3)

2-dimensional example to illustrate the effect of the filtering process. In this example, a 2-dimensional object is composed of a number of primitive instances represented by dots in the figure. Fig. 2(a) shows the instances that are identified as on the boundary of the object by the filtering process.

(a) The effect after the instances on the boundary of the ob-ject have been identified.

(b) The effect after the instances in the caves of the object have been identified.

Fig. 2. An example that illustrates the effects of the filtering process.

With the instances on the boundary of the object been successfully identified, the next task of the filter-ing process is to further classify each of these instances depending on whether it is located in a cave of the ob-ject or not. This task can be carried out by applying equation (1) or (2) again but with a larger E value. Applying equation (1) or (2) with a larger E value im-plies that the approximate probability density function obtained is smoother. As a result, the function values

at those instances that are located in a cave will be generally larger than the function values at those in-stances that are on the boundary of the object but not in a cave. Accordingly, a threshold can be set to classify these instances. Fig. 2(b) shows the final result ob-tained in this example and Fig. 3 shows the pseudo-code of the filtering process.

With the filtering process carried out for both the reference protein and the target protein, the next task that the proposed prediction mechanism carries out is conducting structural alignment on the crucial sub-structures identified. In the proposed prediction mechanism, analysis is carried out at the residue level with each residue represented by its alpha carbon in the vector space. In other words, a protein substructure is defined by the coordinates of the alpha carbons in-cluded in the substructure. In our implementation, we have adopted the common practice for carrying out protein structural alignment with the geometric hashing algorithm [6, 14, 15, 16, 17]. With this practice, the coordinate systems examined by the geometric hashing algorithm are limited to those defined by the two back-bone bonds connected to the alpha carbon of each resi-due. With the filtering process elaborated above, in our implementation, the geometric hashing algorithm further narrows down its search space to only the coor-dinate systems defined by the residues located in a cave.

Algorithm kernel density estimation based filtering Input: A set S = {s1, s2, …, sn} of instances in the

vec-tor space and parameters E1,E2, k, r1, and r2. Output: Sˆ , a subset of S.

Set SˆmS.

For each si S do the following:

Compute fˆ(s_i) according to equation (1) with E =

E1. Set max{ˆ( )} 1i n f si w d d m .

For each si S do the following: If fˆ(s_i)tr1w, then SˆmSˆ{si}. For each si S do the following:

Compute fˆ(s_i) according to equation (1) with E =

E2. Set max{ˆ( )} 1i n f si w d d m .

For each si Sˆ do the following: If fˆ(s_i)dr2w, then SˆmSˆ{si}. Return ( Sˆ ).

Fig. 3. The pseudo-code of the kernel density estima-tion based filtering process.

Instance identified as not on the bounary. Instance identified as on the boundary.

Instance identified as not on the bounary. Instance identified as on the boundary but

not in a cave.

(4)

As far as the time complexity of the prediction mechanism is concerned, in equations (1) and (2) we need to identify the k nearest neighbors for each of the n instances. If the kd-tree structure [3] is incorporated, then the average time complexity for constructing a kd-tree with n instances is O(n log n), if k is considered as a constant. One practical implementation employed in this paper for evaluating equation (1) is to include only the nearest k’ instances of vector v, since the influence of the Gaussian function decreases exponentially. With this practice, the time complexity for evaluating the approximate function value at one instance is there-fore O(k’ log n) and the overall average time complex-ity of the filtering process is O(nlogn), if both k and k’ are considered as constants. Concerning the structural alignment process, as the geometric hashing algorithm narrows down its search space to only the coordinate systems defined by the residues located in a caves of the reference protein and the target protein, the time complexity for comparing the crucial substructures of these two proteins is O(n₁cn₂c(n₁cq)), where nc and ₁

2

nc are the numbers of residues identified as in the caves of the two proteins, respectively, and q is the number of residues in the binding sites of the reference protein.

3. Experimental results

This section reports two experiments conducted to evaluate the effects of the proposed prediction mecha-nism. The main objective of the first experiment is to test the accuracy of the proposed prediction mechanism. The second experiment demonstrates how biochemists can exploit the proposed prediction mechanism to fa-cilitate their research works. Table 1 shows how the parameters are set for the filtering process in the ex-periments and the criteria for successful alignment of two alpha carbons in the geometric hashing algorithm. In the experiments, the likelihood of residue substitu-tion is also taken into account. If the entry in the PAM 250 matrix [1, 11] corresponding to a pair of residues aligned by the geometric hashing algorithm is smaller than 2, then this pair of residues is excluded from the list of successfully aligned.

In the first experiment, three datasets, each of which contains a reference structure and a number of target proteins, are used to test whether the proposed predic-tion mechanism is able to identify the region on the contour of the target protein that contains a similar substructure as the reference protein. Table 2(a) shows the characteristics of these three reference protein structures. The first two reference structures are two alcohol dehydrogenase enzymes in PDB, PDB ID = 1hdz and 1b15, and the third reference structure, PDB

ID = 115g, contains an integrin ĮVȕ3 bound with a peptide ligand as reported in [18]. For each of the two enzyme proteins, 5 proteins from the same family in PDB are employed as the target proteins. For integrin, the alternative structures of integrin with different bindings, PDB ID = 1jv2 and 1m1x, are employed as the target proteins. Table 2(b) reports the results of the first experiment. The experimental results show that, with a high degree of accuracy, the proposed prediction mechanism is able to identify the residues in the bind-ing/active sites of the target protein. The only miss occurs when protein 1hj6 is aligned with reference protein 1b15. However, as Table 2(b) shows, the miss is not due to the filtering process invoked to expedite the analysis. Without the filtering process, the geomet-ric hashing algorithm still can only successfully align 6 out of the 8 residues in the active site of protein 1hj6 with the residues in the active site of the reference pro-tein.

In the second experiment, the proposed prediction mechanism is invoked to figure out whether some pro-teins in the Caspase family may contain a similar bind-ing site as integrin. Table 3 shows the results outputted by the proposed prediction mechanism. It is observed that caspase-7, PDB ID = 1f1j, caspase-8, PDB ID = 1f9e, and caspase-9, PDB ID = 1jxq, have the largest numbers of residues successfully aligned with the resi-dues in the binding site of integrin. This result is in conformity with a hypothesis that the biochemists have been speculating. However, the outputs of the pro-posed prediction mechanism can only be regarded as interesting clues and, as shown in Table 3(b), it is typi-cal that multiple possible alignments are found. There-fore, the biochemists must conduct more in depth analyses, such as protein docking or protein affinity analysis, to further verify the hypothesis.

Table 1. Parameter settings in the experiments.

Parameter Value

E1 in pseudo code 0.45

E2 in pseudo code 0.9

k in equations (1) and (2) 30

kc : the number of nearest Gaussian functions

involved in evaluating equation (1) or (2) 30

r1 in pseudo code 0.45

r2 in pseudo code 0.63

(a) Parameter settings for the filtering process.

|vc ||v s| d 7͘ |TxcTxs| d 0.2 radian

|TycTys| d 0.2 radian

| vcv s| d 6͘

vc and vs are the vectors from the origin to the two alpha carbons, respectively. Txc and Txs are the angles between axis x and the two

vectors vc and vs, respectively. Tyc and Tys are the angles between

axis y and the two vectors vc and vs, respectively.

(b) Criteria for successful alignment of two alpha car-bons in the geometric hashing algorithm.

(5)

Table 2. Data of the first experiment.

PDB ID # of residues # of residues in the bind-ing/active site

# of residues remaining with the filtering process applied

1hdz 748 14 307

1b15 508 8 203

1l5g 1470 18 833

(a) Characteristics of the reference proteins.

Reference protein 1hdz

Target protein 3hud 1htb 1hdy 1deh 1hdx

# of residues in the active site 14 14 14 14 14

Execution time of geometric hashing in seconds 130.24 114.68 129.57 117.13 137. 43 # of residues in the active site that are successfully

aligned 14 14 14 14 14

Geometric hash-ing without filter-ing

RMSD of aligned pairs 0.79 0.37 0.47 0.36 0.49

Execution time of filtering in seconds 0.18 0.2 0.18 0.19 0.18 Execution time of geometric hashing in seconds 27.78 25.38 25.45 24.76 26.2 # of residues in the active site that are successfully

aligned 14 14 14 14 14

Geometric hash-ing with filterhash-ing applied

RMSD of aligned pairs 0.96 0.37 0.54 0.36 0.66

Speedup due to the filtering process 4.66 4.48 5.06 4.69 5.21

Reference proteinʳ 1b15ʳ

Target proteinʳ 1ideʳ 1hj6ʳ 1idcʳ 1iddʳ 1idfʳ

# of residues in the active siteʳ 8 8ʳ 8ʳ 8ʳ 8

Execution time of geometric hashing in secondsʳ 16.75 15.89ʳ 16.7ʳ 15.7ʳ 16.7 # of residues in the active site that are successfully alignedʳ 8 6ʳ 8ʳ 8ʳ 8 Geometric

hash-ing without

filteringʳ RMSD of aligned pairsʳ 0.52 0.42ʳ 0.55ʳ 0.49ʳ 0.43

Execution time of filtering in secondsʳ 0.08 0.07ʳ 0.08ʳ 0.08ʳ 0.08 Execution time of geometric hashing in secondsʳ 2.55 2.13ʳ 2.05ʳ 2.18ʳ 2.13 # of residues in the active site that are successfully alignedʳ 8 6ʳ 8ʳ 8ʳ 8 Geometric

hash-ing with filterhash-ing appliedʳ

RMSD of aligned pairsʳ 0.56 0.54ʳ 0.62ʳ 0.5ʳ 0.57

Speedup due to the filtering processʳ 6.37 7.22ʳ 7.84ʳ 6.95ʳ 7.56

Reference proteinʳ 1l5gʳ

Target proteinʳ 1jv2ʳ 1m1xʳ

# of residues in the binding siteʳ 18ʳ 18

Execution time of geometric hashing in secondsʳ 1051.5ʳ 1036.22 # of residues in the binding site that are successfully alignedʳ 18ʳ 18 Geometric hashing without

filteringʳ

RMSD of aligned pairsʳ 1.21ʳ 1.22

Execution time of filtering in secondsʳ 0.44ʳ 0.45

Execution time of geometric hashing in secondsʳ 338.12ʳ 318.66 # of residues in the binding site that are successfully alignedʳ 18ʳ 18 Geometric hashing with

filter-ing appliedʳ

RMSD of aligned pairsʳ 1.23ʳ 1.22

Speedup due to the filtering processʳ 3.11ʳ 3.25

(b) Experimental results, where RMSD stands for the root-mean-square difference. Concerning the experimental results shown in Table

2 and Table 3, there are a few issues that deserve fur-ther discussion. The first issue concerns the speedups reported in these two tables, which range from 3.11 to 9.79 times. In fact, these numbers just represent a sub-jective tradeoff between the accuracy of the analysis results and the magnitude of speedup obtained. As the expediting mechanism works by reducing the number

of residues to be included in the analysis process, in principle, we could set the parameters listed in Table 1 with a conservative view and an insignificant magni-tude of speedup would result. On the other hand, if we set the parameters listed in Table 1 with a more aggres-sive view, then we could obtain a higher level of speedup but might lose some degree of analysis accu-racy as a consequence.

(6)

Table 3. Results of the second experiment.

PDB ID of the Target Proteinʳ 1cl5ʳ 1cww 1cy5 1f1jʳ 1f9eʳ 1gqfʳ # of residuesʳ 97 102 93 469ʳ 1476ʳ 530 # of residues remaining with filtering appliedʳ 26 31 38 184ʳ 542ʳ 103 Execution time of geometric hashingʳ 7.83 8.23 7.52 103.32ʳ 931.17ʳ 149.93 # of residues in a cave that are successfully aligned 9 10 9 16ʳ 15ʳ 14 Geometric hashing without

filteringʳ

RMSD of aligned pairsʳ 3.91 3.46 3.37 4.24ʳ 3.99ʳ 4.75 Execution time of filteringʳ 0.01 0.01 0.01 0.09ʳ 0.43ʳ 0.1 Execution time of geometric hashingʳ 1.3 1.52 1.76 24ʳ 217.1ʳ 15.22 # of residues in a cave that are successfully aligned 9 8 9 13ʳ 15ʳ 13 Geometric hashing with

filtering appliedʳ

RMSD of aligned pairsʳ 3.91 4.08 3.37 4.01ʳ 4.25ʳ 5.06 Speedup due to the filtering processʳ 5.98 5.38 4.25 4.29ʳ 4.28ʳ 9.79

PDB ID of the Target Proteinʳ 1jxqʳ 1k86ʳ 1k88ʳ 1nmeʳ 2ygs # of residuesʳ 940 464ʳ 461ʳ 238ʳ 92 # of residues remaining with filtering appliedʳ 329 110ʳ 112ʳ 66ʳ 33 Execution time of geometric hashingʳ 452.89 112.72ʳ 109.42ʳ 33.73ʳ 7.83 # of residues in a cave that are successfully alignedʳ 14 14ʳ 13ʳ 12ʳ 9 Geometric hashing without

filter-ingʳ

RMSD of aligned pairsʳ 4.24 4.06ʳ 4.04ʳ 4.14ʳ 3.51 Execution time of filteringʳ 0.22 0.09ʳ 0.09ʳ 0.03ʳ 0.02 Execution time of geometric hashingʳ 83.45 18.45ʳ 15.43ʳ 4.75ʳ 1.54 # of residues in a cave that are successfully alignedʳ 12 12ʳ 13ʳ 11ʳ 9 Geometric hashing with filtering

appliedʳ

RMSD of aligned pairsʳ 4.16 4.15ʳ 4.04ʳ 3.87ʳ 3.51 Speedup due to the filtering processʳ 5.41 6.08ʳ 7.05ʳ 7.06ʳ 5.02

(a) Output of the proposed prediction mechanism.

Protein integrin ĮVȕ3 (reference protein) Protein caspase-8 (PDB ID = 1f9e) Chain Residue Index Residue Type Chain Residue Index Residue Type

PAM250 Score A 178 TYR D 320 TYR 10 A 218 ASP A 297 GLU 3 B 119 ASP B 388 GLN 2 B 121 SER B 339 SER 2 B 122 TYR B 340 TYR 10 B 123 SER B 378 SER 2 B 126 ASP B 351 GLN 2 B 158 ASP A 291 GLN 2 B 215 ASN B 381 ASP 2 B 216 ARG B 384 LYS 3 B 217 ASP D 323 ASP 4 B 219 PRO D 322 PRO 6 B 220 GLU D 324 GLU 4 B 251 ASP B 374 ASN 2

Protein integrin ĮVȕ3 (reference protein) Protein caspase-8 (PDB ID = 1f9e) Chain Residue Index Residue Type Chain Residue Index Residue Type

PAM250 Score A 150 ASP K 289 ASN 2 A 178 TYR K 290 TYR 10 A 218 ASP L 385 GLN 2 B 119 ASP K 170 ASN 2 B 121 SER K 236 SER 2 B 122 TYR K 244 TYR 10 B 126 ASP K 178 ASP 4 B 127 ASP K 180 ASN 2 B 215 ASN K 239 ASP 2 B 216 ARG K 240 LYS 3 B 217 ASP K 286 GLN 2 B 218 ALA K 284 ALA 2 B 219 PRO L 387 PRO 6 B 220 GLU K 283 GLN 2 B 251 ASP V 4604 ASP 4

(b) Two possible mappings of the residues in the crucial substructures of caspase-8 to the residues in the binding site of integrinөVӪ3.

(7)

Another issue that deserves further discussion is whether these parameters should be set differently, if different sets of proteins are to be analyzed. According to our experiences, when another reference protein is given, the only parameter values in Table 1(a) that need to be adjusted are r1 and r2. That is, except the values of r1 and r2, the user basically can adopt the parameter values listed in Table 1 (a), when a different reference protein is given. The main reason why only r1 and r2 need to be taken into account is that the pa-rameter setting just represents a subjective tradeoff between the accuracy of the analysis and the magnitude of speedup obtained. By setting r1 to 1 and r2to 0, we basically keep all the residues. On the other hand, by setting either r1 to 0 or r2to 1, we can eliminate all the residues. Since we can realize our view of tradeoff without any limitation through adjusting the values of r1 and r2, there is no need to manipulate other parame-ters. Basically, when given a reference protein, the user needs to try a number of possible combinations of r1 and r2, and select one that can accurately extract the residues in the binding site of the reference protein.

4. Conclusion and future work

In this paper, an efficient mechanism for prediction of possible protein-ligand interaction based on analysis of protein tertiary substructures is proposed. In one ex-periment presented in this paper, the proposed predic-tion mechanism has been exploited to investigate whether some proteins in the caspase family contains a similar binding site as the structure of integrin reported in [18] and the experimental results are in conformity with a hypothesis that the biochemists have been speculating. However, the predictions made by the proposed mechanism can only be regarded as interest-ing clues for more in depth investigations. The ex-perimental results also show that the filtering process presented in this paper is able to speed up the analysis process by a factor ranging from 3.11 to 9.79 times. As the experiences learned from this research work have been encouraging, continuous investigation of the following issues is of interest. The first issue concerns whether we can identify significant structure activity relationships (SAR) among the binding sites of pro-teins. With the SAR, the drug design process will be greatly facilitated. The second issue that deserves further investigation concerns prediction of protein functions based on crucial tertiary substructures.

References:

1. Altschul S.F. (1991) Amino acid substitution matrices from an information heoretic perspective. J. Mol. Biol. 219, 555-565.

2. Atkins, P.W. and Depaula, J. (2001), Physical Chemistry, 7th ed., W H Freeman & Co.

3. Bentley, J. L. (1975) Multidimensional binary search trees used for associative searching," Communication of the ACM, 18, 509-517.

4. Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E. (2000) The Protein Data Bank. Nucleic Acids Research, 28, 235-242.

5. Bourne, P.E. and Weissig H. ed. (2003) Structural Bioin-formatics, New Jersy: Wiley-Liss, Inc.

6. Boutonnet, NS, Rooman, MJ, Ochagavia, ME, Richelle, J, Wodak, SJ. (1995) Optimal protein structure align-ments by multiple linkage clustering: application to dis-tantly related proteins. Protein Eng. 8, 647-62.

7. Edelsbrunner, H., and Mucke, E.P. (1994) Three-dimensional alpha shapes. ACM Trans. Graphics, 13, 43-72.

8. Haim, J. W. (1997) Geometric hashing: an overview.

IEEE Comput. Sc. and Eng. 4, 10-21.

9. Krane, D. E.and Raymer, M. L. (2002) Fundamental

Concepts of Bioinformatics, Benjamin Cummings.

10. Lamdan, Y. and Wolfson, H. (1988) Geometric Hashing: A General and Efficient Model-Based Recognition Scheme, Proc. Int'l Conf. Computer Vision, 238-249. 11. Lesk, A.M. (2002) Introduction to bioinformatics, New

York : Oxford University Press.

12. Oyang, Y.-J., Chang, D. T.-H., Chen, C.-Y., and Hwang, S.-C., Expediting Protein Structural Analysis with an Ef-ficient Kernel Density Estimation Algorithm , In Pro-ceedings of IEEE 5th International Symposium on Mul-timedia Software Engineering, Taichung, Taiwan, 2003. 13. Oyang, Y.-J., Hwang, S-C, Ou, Y.-Y., Chen, C.-Y., and

Chen, Z.-W. (2002) A Novel Learning Algorithm for Data Classification with Radial Basis Function Networks, In Proceedings of 9th International Conference on

Neu-ral Information Processing (ICONIP-2002), Singapore,

2002.

14. Orengo, C. and Taylor, W. (1996) SSAP: Sequential Structure Alignment Program for Protein Structure Com-parison. Methods in Enzymology, 266, 617-635. 15. Pennec, X. and Ayache, N. (1994) An O(n2) Algorithm

for 3D Substructure Matching of Proteins, In A. Califano, I. Rigoutsos, and H.J. Wolson, editors, Shape and Pattern Matching in Computational Biology - Proc. First Int. Workshop, Seattle, pp. 25-40, Plenum Publishing. 16. Pennec, X. and Ayache, N. (1998) A geometric algorithm

to find small but highly similar 3D substructures in pro-teins, Bioinformatics, 14, 516-522.

17. Tu, J.-T. (2003) Protein Active Site Prediction By Match-ing 3D Structural Data, Master thesis, Department of Computer Science and Information Engineering, National Taiwan University, 2003.

18. Xiong, J.P., Stehle, T., Zhang, R., Joachimiak, A., Frech, M., Goodman, S.L., and Arnaout, M.A. (2002) Crystal structure of the extracellular segment of integrin alpha Vbeta3 in complex with an Arg-Gly-Asp ligand. Science. 296, 151-5