Chapter 3. Results and Discussions
3.8 Web Service
PiSA-BLAST has been setup to a web service as shown in Figure 33. User can input three kinds of query formats: PDB code, SCOP code, and users’ upload 3D structure on the web service to use as a query against the whole structural database and search similar structures. The searching databases includes PDB, nr-PDB, SCOP all, SCOP 95, and SCOP 40.
User can acquire the information including the retrieval lists of database searching, the alignment of encoded sequence, the detail structure comparison using CE and the original sequence alignment between query and subject proteins using FASTA [21, 22] program. The hyperlink of our web service is:
http://gemdock.life.nctu.edu.tw/pisa-blast/
Chapter 4
Conclusions
4.1 Summary
In summary, we provide a novel method to do fast one-against-all structural database searching. From 3D to 1D level, PiSA-BLAST can decrease execute time by translating 3D-structure to 1D-sequence and using sequence level to align structure. From 1D to 3D, PiSA-BLAST can enhance the accuracy of sequence alignment for structure searching by adding segment information into 1D-sequence. We use cluster algorithm to group segments, decide representative fragment and assign new codes for structure transforming. After that, we design a rational and usable substitution matrix for new codes. Totally, our results show that PiSA-BLAST is quite efficient and reasonably effective. The database searching time of PiSA-BLAST is very faster then CE. Although PiSA-BLAST could not provide the same accuracy as the results of CE, it can be used as a fast filter to pre-select the top rank 10% to 30% of structure candidates and further evaluation. Given the very fast speed of PiSA-BLAST, this filter-and-refine strategy can reduce the running time by about many folds while maintaining the good accuracy of the detailed comparison methods.
4.2 Major Contributions and Future Perspectives
Here, we have developed a fast structure alignment tool for protein database searching.
We evaluated PiSA-BLAST on the retrieval efficiency and effectiveness of the scheme in comparison with the other methods. The results showed that PiSA-BLAST is very much faster
than two well-known protein structure comparison methods, DALI and CE and yet not sacrificing on the accuracy of the comparison.
Because PiSA-BLAST can provide a very speedy efficiency on database searching, PiSA-BLAST can be as a useful pre-filtering tool in the near future when the size of protein structure database grows too large to be searched through exhaustively. In filter-and-refine framework, it can be used to reduce the search space before running a more detailed but slower structural comparison method. We are able to perform PiSA-BLAST to do a fast alignment searching at first and output some results of top rank. After that, we achieve the detailed database search by other more delicate but slower structure alignment tools in order to acquiring the best performance and efficiency.
As a future work, we can further improve the accuracy of PiSA-BLAST by using different encoding rules and adding more structural information. Besides, PiSA-BLAST can provide practical applications on fold assignment and homology searching as the preliminary results. Furthermore, our method is to transform 3-dimensional structure to 1D sequence. So, the encoded sequences may be applied to the issue of multiple structure alignment.
Table 1. A small test set selected from previous work [14]. There are 200 members in the database and 20 queries in two SCOP families are listed
Globins family Serine/ Threonin kinase family sccsid a: a.1.1.2 sccs id: d.144.1.1
d1a6m__ d1a06__
d1ash__ d1apme_
d1b0b__ d1b6cb_
d1fhjb_ d1csn__
d1gcva_ d1f3mc_
d1irda_ d1h8fa_
d1itha_ d1howa_
d1mba__ d1jvpp_
d2gdm__ d1phk__
d3sdha_ d1tkia_
a The sccs id is a compact representation of a SCOP domain classification. A sccs identifier includes only the class, fold, superfamily, and family to which each domain belongs to.
Table 2. Summary of 108 queries selected from SCOP all and SCOP 95
SCOP id SCOP sccs
Table 3. Comparison PiSA-BLAST with six methods on the dataset shown in Table 1 Average no. of retrievals required b
No. of relevant
retrievals a DALI c CE c TopScan c ProtDex2 c BLAST PSI-BLAST PiSA-BLAST
1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2
4 4 4 5 4 7 7 4
6 6 6 8 6 18 17 6
8 8 8 14 9 47 25 8
10 10 10 29 16 93 38 10
a Relevant retrieval is defined as an event of retrieving a protein from the database that belongs to the same ‘family’ as the query.
b The number represents the average ranking under the various methods to retrieve the number of relevant answers in a.
c The results are directly summarized from [14] .
Table 4. Executing times of 20 queries on the database with 200 proteins shown in Table 1
Method Total time a (in seconds)
Average time per query
(in seconds)
Related ratio comparing to PiSA-BLAST b
DALI c 23464 1173.180 25014
CE c 4632 231.600 4938
TopScan c 7.310 0.366 7.79
ProtDex2 c 1.982 0.099 2.11
BLAST 0.335 0.0168 0.36
PSI-BLAST 1.052 0.0526 1.12
PiSA-BLAST 0.938 0.0469 1.00
a The total searching time of every query searching in the small database.
b The ratio of total time of PiSA-BLAST to various methods.
c The results are directly summarized from [14].
Table 5. Running times of 108 queries on the database with 33311 proteins shown in Table 2
Method Total time a (in seconds)
Average time per query (in seconds)
Related ratio comparing to PiSA-BLAST b DALI c about 250 days about 2.31 days about 216000 CE c about 50 days about 0.46 days about 43000
TopScand 11715 108.475 117
ProtDex2d 104 0.967 1.05
BLAST 22.196 0.2055 0.22
PSI-BLAST 53.722 0.4974 0.54
PiSA-BLAST 99.901 0.9250 1.00
a The total searching time of every query searching in the large database with 33311 proteins.
b The ratio of total time of PiSA-BLAST to various methods.
c The total searching time of DALI and CE is approximate time.
d The results are directly summarized from [14].
Table 6. Comparison running times of BLAST, PSI-BLAST and PiSA-BLAST for 108 queries searching on five databases selected from PDB and SCOP. These 108 queries are
shown in Table 2
Total running times (in seconds) Database Published
date
Number of sequence in
database BLAST PSI-BLAST PiSA-BLAST
PDB 19-Apr-05 64333 53.517 119.444 240.774
nr-PDB 19-Apr-05 10308 9.164 23.883 35.050
SCOP 1.65 53659 34.452 76.092 155.178
SCOP 95% 1.65 9354 6.921 18.312 34.349
SCOP 40% 1.65 5630 4.713 13.163 22.487
Table 7. Average precisions of five alignment tools on 108 queries searching on the SCOP 95 database. These 108 queries are shown in Table 2
Average precision
95% CE BLAST PSI-BLAST PiSA-BLAST PiSA-PSI-BLAST 1 d1a8h_2 c.26.1.1 C 344 18 0.905 0.470 0.802 0.708 0.707
Average precision
95% CE BLAST PSI-BLAST PiSA-BLAST PiSA-PSI-BLAST 34 d1ep3b1 b.43.4.2 B 97 17 0.965 0.078 0.078 0.774 0.772
Average precision
95% CE BLAST PSI-BLAST PiSA-BLAST PiSA-PSI-BLAST 69 d1jb7a2 b.40.4.3 B 120 22 0.502 0.099 0.100 0.301 0.302
Average precision Query
#
SCOP id SCOP sccs
One-code class ID
Query sequence
Length
Family Size on SCOP
95% CE BLAST PSI-BLAST PiSA-BLAST PiSA-PSI-BLAST 104 d2cmd_1 c.2.1.5 C 141 26 0.834 0.887 0.992 0.995 0.995 105 d2shpa1 c.45.1.2 C 263 12 1.000 0.988 1.000 1.000 1.000 106 d1cqda_ d.3.1.1 D 454 28 0.974 0.929 0.982 0.960 0.960 107 d3grx__ c.47.1.1 C 77 19 0.347 0.327 0.486 0.267 0.268 108 d3pmga1 c.84.1.1 C 186 9 0.412 0.335 0.334 0.337 0.337
B
Figure 1. Step-by-step illustration of the PiSA-BLAST methodology using 1brbI as the query protein searching against nr-PDB (protein data bank). (A) The two structures (1brbI is blue and 1bf0 is gray) to be compared showing protein structures. (B) The kappa-alpha angle (κ, α) 2D map of all residues in each of the two proteins. These two proteins have the similar (κ, α) 2D maps. (C) All of 3D-protein structures in the nr-PDB are encoded into 1D-structure sequences with 23 different codes according to the (κ, α) 2D map (see text). The red codes are the SSE parts in each of the two proteins. (D) The structure searching results using BLAST with our new substitution matrix (see text). (E) The aligned result and score of two1D-structure sequences. The score is calculated according to the substitution matrix, e.g., the score is 6 aligning ‘T’ to ‘T’, 6 aligning ‘K’ to ‘K’, and –4 aligned ‘T’ to ‘H’. (F) The resulting structure alignments for the alignment solution identified in (E) by structure alignment tool, CE.
Translate 3D structure to 1D Sequence
Prepare data set from protein structure database
Divide proteins into segments by kappa and
alpha angle map
Find a representative structural segment for each
kappa-alpha angle
Cluster all representative segments into 23 groups
and assign one code respectively
Structure database search using sequence alignment tool: “BLAST”
Develop a new substitution matrix
Generate a substitution matrix for 23 new codes Prepare data set from protein structure database
Evaluate the performance Practical applications
Figure 2. Overview of our method. First, we prepare training set from ASTRAL SCOP database 1.65 40% set. Second, we divide domain proteins of training set into many segments that are have various kappa and alpha angle. Then, we find representative segments of each kappa and alpha angle and use cluster algorithm to group these representative segments. After that, we assign a new code for each representative group. Next, we need to develop a substitution matrix for new codes and use it to replace default matrix for sequence alignment tool. We can use sequence alignment tool to do fast protein structure searching in database and evaluate the performance. Finally, we apply the PiSA-BLAST on practical application.
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
A C D E F G H I K L M N P Q R S T V W Y
Amino acid
Ra ti o of fre que nc y
DSSP database SCOP 95%
SCOP 40%
Training set
Figure 3. Comparison the amino acids compositions of our train set, including 1584 proteins for encoding the structured codes and the substitute matrix, with three well-known structure databases (DSSP database, SCOP 95 and SCOP 40 database). The distributions of amino acids compositions of these four databases are similar.
Figure 4. The kappa-alpha distribution of 263696 segments in our training set (792 protein pairs) are colored. The color bar on the right side shows the distribution scale. These segments are encoded into 23 codes based on the distributions of kappa and alpha angle. The helix-like segments (e.g., A, B, C and D) have more than 9000 segments whose alpha angle ranging from 40° to 60° and kappa angle ranging from 100° to 120°. The strand-like segments (e.g., E and F) have over 3000 segments with alpha angle ranging from -180° to -140° and kappa angle ranging from 0° to 20°.
Accumulated distributions of 20 aa.
Accumulated distributions of 23 codes
0
Figure 5. Accumulated distributions of (A) 20 kinds of amino acids and (B) 23 new codes in training set. The accumulated distribution of 23 codes is similar to the distribution of 20 amino acids. The most number in 20 amino acids is amino acid, leucine (L), and the ratio is 9.26%. The most quantity in 23 new codes for PiSA-BLAST is H and the ratio is 6.99%.
Figure 6. The conformations of the representative segments of 23 new codes. The new codes, A, Y, B, C and D, are helix; G, I and L are helix-like; F and H are strand; K and N are strand-like; and the other codes are loop-like segments.
I:Helix (A, Y, B, C, D) II:Helix-like (G, I, L)
III:Strand (E, F, H) IV:Strand-like (K, N)
Figure 7. The conformations of representative segment in each cell of four main groups: (I) helix codes (A, Y, B, C, D) have 4 segments; (II) helix-like codes (G, I, L) have 12 segments;
(III) strand codes (E, F, H) have 15 segments; (IV) strand-like codes (K, N) have 11 segments.
As the conformations show, the structure of segments is very similar in same secondary structure defined region.
DSSP code: H, G and I
0 5000 10000 15000 20000 25000 30000
A&Y B C D G I L E F H K N M P Q R S T V W X Z
PiSA-BLAST code
Fr eque nc y
(A)
DSSP code: E and B
0 2000 4000 6000 8000 10000 12000 14000 16000
A&Y B C D G I L E F H K N M P Q R S T V W X Z
PiSA-BLAST code
Frequency
(B)
DSSP code: S, T and others
0 2000 4000 6000 8000 10000 12000
A&Y B C D G I L E F H K N M P Q R S T V W X Z
PiSA-BLAST code
Fr eq ue nc y
(C)
Figure 8. The distribution relationship between 23 new codes (in PiSA-BLAST) and 8 secondary structure codes (in DSSP): (A) The structural-coded distribution of helix codes (H, G and I) in DSSP; (B) The structural-coded distribution of strand codes (E and B) in DSSP;
(C) The structural-coded distribution of loop codes (S, T and others) in DSSP. The distributions of helix, helix-like, strand and strand-like segments defined by PiSA-BLAST are high related to secondary structures in DSSP.
0.698691
0.665 0.67 0.675 0.68 0.685 0.69 0.695 0.7 0.705
1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 λ value
A ver ag e p reci si on
9,1 10,1 11,1 6,2 7,2 8,2
Figure 9. The average precisions of PiSA-BLAST on 108 queries searching on SCOP 95 using various values of λ and gap penalty. We tested six kinds values of open and extend gap penalty with different λ values to find out the optimized parameter for the performance of PiSA-BLAST. Here, the open gap penalty is set to 8 and extend gap penalty is 2.
0.69942
0.69 0.692 0.694 0.696 0.698 0.7
1.82 1.83 1.84 1.85 1.86 1.87 1.88 1.89 1.91 1.92 1.93 λ value
Av er ag e p reci si on
Figure 10. The average precision plot of PiSA-BLAST on 108 queries searching on SCOP 95 using various values of λ. The best performance λ value is 1.89, open gap penalty is 8, and extend gap penalty is 2.
A Y C B D H E F K N T P X V M G I L W S R Q Z
Figure 11. The substitution matrix of 23 new codes. The scores on the diagonal cells are much higher than the scores on the non-diagonal cells. Red dot-square part (A, Y, C, B, and D) is the scores of aligning helix codes to helix codes and blue dot-square part (H, E, and F) is the scores of aligning strand codes to strand codes. The scores of aligning helix codes to strand codes are the smallest.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Recall
Pr eci si on
CE (z-score) CE (rmsd)
Figure 12. Recall-precision curves of CE using z-score and rmsd to order searching results on 108 queries searching the SCOP 95 database. The results of CE searching which are sorted by z-score are much more accurate than by rmsd.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Recall
Precision
TopScan ProtDex2
PiSA-BLAST BLAST
PSI-BLAST
Figure 13. Recall-precision curves of five alignment tools for 108 queries on the large database of 33311 proteins indicated in Table 2. The results of ProtDex2 and TopScan, two fast structure alignment tools, are summarized from [14]. PiSA-BLAST is the best and TopScan is the worse among these five approaches.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Recall
Precision
CE BLAST PSI-BLAST PiSA-BLAST PiSA-PSI-BLAST
Figure 14. Recall-precision curves for 108 queries with CE, BLAST, PSI-BLAST, PiSA-BLAST and PiSA-PSI-BLAST on SCOP 95 database (ver 1.65). Accuracy of PiSA-BLAST closes the results of CE and PiSA-BLAST is about 34000 times fast than CE.
PiSA-PSI-BLAST surprisingly only slightly improves PiSA-BLAST. In contrast, the performance of PSI-BLAST is much better than BLAST.
0 0.2 0.4 0.6 0.8 1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 False positive rate
T rue pos it ive r at e
BLAST PSIBLAST PiSA-BLAST
Figure 15. ROC curves of three tools performing 108 queries on the large database of 33311 proteins shown in Table 2. PiSA-BLAST can appear the accuracy more than other methods.
0 0.2 0.4 0.6 0.8 1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 False positive rate
True positive rate
CE BLAST PSIBLAST PiSA-BLAST PiSA-PSI-BLAST
Figure 16. ROC curves of five tools perform 108 queries on SCOP 95% database.
PiSA-BLAST and PiSA-PSI-BLAST can appear the performance close to CE and are more accurate than sequence alignment tools, BLAST and PSIBLAST.
HEADER SCOP/ASTRAL domain d1c41a_ [30962] 21-NOV-03 0000 Chain 1: d1di0a_.ent:A (Size=148) Chain 2: d1c41a_.ent:A (Size=72)
Alignment length = 61 Rmsd = 3.87A Z-Score = 3.7 Gaps = 21(34.4%) CPU = 0s Sequence identities = 8.2%
Chain 1: 58 RTGRYAAIVGAAFVIDGGIYDHDFVATAVINGMMQVQLETEV---PVLSVVLTPHHFHESKEHHDFFHAH Chain 2: 7 HDGSALRIGIVHARWN---ETIIEPLLAGTKAKLLACGVKESNIVVQSVPG---SWEL Chain 1: 125 FKVKGVEAAHAA
Chain 2: 59 PIAVQRLYSASQ
(A) (B)
(C)
(D)
Figure 17. The illustration of “chain-break” problem in CE alignment. (A) The 3D structure of subject protein “d1c41a_”; (B) the conformation of structure comparison of query “d1di0a_”
and subject protein “d1c41a_” using CE; (C) the coordinate file of 3D structure in “d1c41a_”;
(D) the alignment file of CE result. There is the condition of chain-break in subject protein
“d1c41a_” shown with blue square in (A) and (C). The residue number is non-continuous from 76 to 107. The conformation of structure alignment of two proteins is slightly unsatisfied.
Furthermore, the alignment length is shorter than the length of query protein and both Z-score and Rmsd is quite low as the alignment result in (D). Besides, we observed that CE determines the wrong length of the domain protein “d1c41a_”. As shown in the red underline, the original length of “d1c41a_” is 165 but the size detected by CE is only 72 because of chain-break problem.
Chain 1: /data/pdb/scop/scop/pdbstyle-1.65/ej/d1ej8a_.ent:A (Size=140) Chain 2: /data/pdb/scop/scop/pdbstyle-1.65/es/d1eso__.ent:_ (Size=154)
Alignment length = 109 Rmsd = 2.07A Z-Score = 4.4 Gaps = 75(68.8%) CPU = 0s Sequence identities = 15.6%
Chain 1: 1 SSAVAILETFQ---KYTIDQKKDTAVRGLARIVQVGENKTLFDITVNGVPEAGNYHASIHEKGDVSK---Chain 2: 1 ASEKVEMNLVTSQGV---GQSIGSVTITETD-KGLEFSPDLKAL-PPGEHGFHIHAKGSCQPATK
Chain 1: 65 ---GVESTGKVW---HKFDEPIECFNESDLGKNLYSGKTFLSAP--LPTWQLIG Chain 2: 61 DGKASAAESAGGHLDPQNTGKHEGPEGAGHLGDLPALVVNND---GKATDAVIAPRLKSLDEIKD
Chain 1: 111 RSFVISK---SLNHPENEPSSVKDYSFLGVIA Chain 2: 123 KALMVHVGGDNMSDQPKPLGGGG---ERYACGVIK
(A)
(B)
Figure 18. The illustration of the problem of ordering the searching results by Z-score in CE alignment. (A) The conformation of structure comparison of query “#32 d1ej8a_” and subject protein “d1eso__” using CE; (B) the alignment file of CE result. The structures of query and subject proteins are similar and the rmsd is 2.07, but the Z-score is only 4.4. Therefore, the rank of the subject protein is 50 and behind 40 false positive proteins.
-180 -165 -150 -135 -120 -105 -90 -75 -60 -45 -30 -15 0
0 1 2 3 4 5 6 7 8
Z-score of CE
L og( e- val ue of P iSA -B L A ST )
(A) (B)
Figure 19. The relationship between e-value and structure similarity in PiSA-BLAST. The 1681 points in total on the plot mean every query and subject protein pairs searching in SCOP 95 database. There are 943 points in area (A) and only 79 points in area (B). PiSA-BLAST achieves 98.6% and 92.2% proteins whose Z scores are more than 4.0 and 5.0 when the e-value is less than 10-15. PiSA-BLAST provides a significance estimate like e-value in BLAST to indicate what the performance is better.
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
100 90- 100 80- 90 70- 80 60- 70 50- 60 40- 50 30- 40 20- 30 10- 20 0- 10
Precision (%)
Pe rc en ta ge
<=1.0E-15
>1.0E-15
Figure 20. The relationship between e-value and precision in PiSA-BLAST. PiSA-BLAST performs 108 queries on the SCOP 95 database. The yellow bars mean that the distribution of e-value of PiSA-BLAST is less than 10-15 and red ones mean that the distribution of e-value is more than 10-15. The protein pairs of precision with 80% and upper occupy 91% protein pairs at below 10-15 of e-value of PiSA-BLAST.
Seq. Identity >=25%
80-100 70-80 60-70 50-60 40-50 30-40 20-30 10-20 0-10
Precision (%)
80-100 70-80 60-70 50-60 40-50 30-40 20-30 10-20 0-10
Precision (%)
P e rc enage ( % )
BLAST PiSA-BLAST (B)
Figure 21. Comparison PiSA-BLAST with BLAST with high sequence identity (> 25%) on two databases: (A) the database with 33311 proteins shown in Table 2 and (B) the SCOP 95.
PiSA-BLAST and BLAST have the similar performance.
Seq. Identity <25%
80-100 70-80 60-70 50-60 40-50 30-40 20-30 10-20 0-10
Precision (%)
80-100 70-80 60-70 50-60 40-50 30-40 20-30 10-20 0-10
Precision (%)
P e rc ent age ( % )
BLAST PiSA-BLAST (B)
Figure 22. Comparison PiSA-BLAST with BLAST with low sequence identity (< 25%) on two databases: (A) the database with 33311 proteins shown in Table 2 and (B) the SCOP 95.
PiSA-BLAST is much better than BALST for low sequence identity. The performance of BALST is more sensitive to the sequence identity than PiSA-BLAST do.
Z-score >=3.5
80-100 70-80 60-70 50-60 40-50 30-40 20-30 10-20 0-10
Precision (%)
80-100 70-80 60-70 50-60 40-50 30-40 20-30 10-20 0-10
Precision (%)
P er c ent age ( % )
BLAST PiSA-BLAST (B)
Figure 23. Comparison PiSA-BLAST with BLAST with high Z-score (> 3.5 by CE) on two databases: (A) the database with 33311 proteins shown in Table 2 and (B) the SCOP 95.
PiSA-BLAST outperforms BLAST, especially, when the sequence identity is low.
(A) Z-score <3.5
80-100 70-80 60-70 50-60 40-50 30-40 20-30 10-20 0-10
Precision (%)
80-100 70-80 60-70 50-60 40-50 30-40 20-30 10-20 0-10
Precision (%)
Per c ent age ( % )
BLAST PiSA-BLAST (B)
Figure 24. Comparison PiSA-BLAST with BLAST with low Z-score (< 3.5 by CE) on two databases: (A) the database with 33311 proteins shown in Table 2 and (B) the SCOP 95.
PiSA-BLAST outperforms BLAST, especially, when the sequence identity is low. .
0
Figure 25. The correlations between Z-score (CE) and sequence identity calculated by (A) PiSA-BLAST and (B) BLAST. The correlation coefficient is 0.72 between encoded sequence identity of PiSA-BLAST and Z-score of CE, on the other hand, the correlation coefficient is 0.61 between amino acid sequence identity and Z-score.
d1qe0a1 95 aa vs. d1nj1a1 127 aa 15.5% identity;
----IEENL---DLFIVTM---GDQADRYAVKLLNHLRHNGIKADKDYLQRKIKGQMK--QADRLGAKFTIVIGDQELENNKIDVKNMTTGESETIELDALVEYFKK . .. .. :: . .... . .: ..:. :... : .: :.. : . . :. . . :: ..::.. .. :::. : .:... : ...
SGLCLPPDVAAHQVVIVPIIFKKAAEEVMEACRELRSRLEAAGFRVHLD--DRDIRAGRKYYEWEMRGVPLRVEIGPRDLEKGAAVISRRDTGEKVTADLQGIEETLRE
Score = 108 bits (272), Expect = 1e-25, Identities = 40/93 (43%), Positives = 74/93 (79%), Gaps = 5/93 (5%) Query: 4 MPFEEF----VTIDCBBGGDCDB-CACBBSRTNHFKHMSRKKIAACCDBACDSNEMPEEEHNIADDCSXNKHEFFGLSXNEFHKKCGQMLYDC 91
+PFEEF V +++++ +C+B CA++BSRT HF H+ + + +++++++++S MP+EEHN ++++S +KHEFFGLS++++HKK QM++DC Sbjct: 11 VPFEEFHHTQVQMAYYYBAYCYBACABDBSRTRHFEHXTQTNDDCBYYACDCSRKMPFEEHNADCGASQPKHEFFGLSRHHKHKKGWQMDCDC 103
Chain 1: d1qe0a1.ent:A (Size=95) Chain 2: d1nj1a1.ent:A (Size=127)
Alignment length = 93 Rmsd = 1.74A Z-Score = 5.5 Gaps = 5(5.4%) CPU = 0s Sequence identities = 18.3%
Chain 1: 2 EENLDLFIVTMG---DQADRYAVKLLNHLRHNGIKADKDYLQRKIKGQMKQADRLGAKFTIVIGDQELENNKIDVKNMTTGESETIELDALVE Chain 2: 9 VAAHQVVIVPIIFKKAAEEVMEACRELRSRLEAAGFRVHLDDRDIRAGRKYYEWEMRGVPLRVEIGPRDLEKGAAVISRRDTGEKVTADLQGIEE
(A)
(B)
(C)
(D)
Figure 26. The results of FASTA, PiSA-BLAST and CE alignment to related domains: query protein “d1qe0a1” and subject protein “d1nj1a1”. (A) The sequence alignments in original amino acid by FASTA, (B) database searching with structural-encoded sequences by PiSA-BLAST, (C) structural alignment by CE and (D) the conformation of d1qe0a1 (blue) and d1nj1a1 (red) by CE. The sequence identity is 15.5% and the e-value of PiSA-BLAST is 10-25. The Z-score of CE result is 5.5 and the conformation between query protein and subject protein is similar.
d1gr3a_ 132 aa vs. d1aly__ 146 aa
Score = 80.3 bits (199), Expect = 3e-17, Identities = 38/93 (40%), Positives = 63/93 (67%), Gaps = 15/93 Query: 36 SRPEFHMSXVPF-HEEEEFFH---XWVTHHEFN--EVWQP---HMPEEHFFEKVLP- 82
S++ ++M XVPF HE+EE FH X++TH+EF+ E W+P H+PEEHFF KV + Sbjct: 41 SQTNEEMPXVPFNHEFEEHFHNFGADPXSMTHEEFHFEEKWRPNNFHVPEEHFFNKVQTT 100 Query: 83 --HFEEHEEKHFHETLQTEXKHKHMPGDQXHHN 113
H++E+ K++++ L + KH MP++Q +++
Sbjct: 101 PFHKFEEXZKNKNFKLZEHHKHEKMPMBQTENT 133
(A)
(B)
Chain 1: d1gr3a_.ent:A (Size=132) Chain 2: d1aly__.ent:_ (Size=146)
Alignment length = 118 Rmsd = 1.91A Z-Score = 5.7 Gaps = 32(27.1%) CPU = 0s Sequence identities = 11.0%
Chain 1: 3 VSAFTVILSKAYP---AIGTPIPFDKI--LYNRQQ-HYDPRTGIFTCQIPGIYYFSYHVHVKGT---Chain 2: 6 QIAAHVISEA--SSKTTS--VLQWAEKGYYTMSNNLVTLENGKQLTVKRQGLYYIYAQVTFCSNREASSQ Chain 1: 61 -HVWVGLYKN---GTPVMYTYDEY---TKGYLDQASGSAIIDLTENDQVWLQLPNAESN--GLYSSEY Chain 2: 72 APFIASLCLKSPGRFERILLRAANTHSSAKPCGQQSIHLGGVFELQPGASVFVNV----TDPSQVSHG-T Chain 1: 120 VHSSFSGFLV
Chain 2: 137 GFTSFGLLKL
(C)
(D)
Figure 27. The results of FASTA, PiSA-BLAST and CE alignment to related domains: query protein “d1gr3a_” and subject protein “d1aly__”. (A) The sequence alignments in original amino acid by FASTA, (B) database searching with structural-encoded sequences by PiSA-BLAST, (C) structural alignment by CE and (D) the conformation of d1gr3a_(blue) and d1aly_(red) by CE. The sequence identity is 17.2% and the e-value of PiSA-BLAST is 3*10-17. The Z-score of CE result is 5.7 and the conformation between query protein and subject protein is similar.
d1dbqa_ 276 aa vs. d1tlfa_ 296 aa
Identities = 101/272 (37%), Positives = 211/272 (77%), Gaps = 20/272 (7%)
Query: 1 FEEHTVPGQTDLGGCBBBYYCDCDDDSRTFFHHEHETQTGYACDDCDCBYCD-SNFVPEE 59 FEEH V++Q ++ ++B+++++++D+SRTF+ ++++ Q+ A+++++CB++D S+ VP+E Sbjct: 2 FEEHFVTQQSGDYCBYBDBBBAYBDASRTFEKEFEFKQPXLAACBBYCBABDGSPNVPFE 61
Query: 60 HNTMPEKIDCBDBDGBRXGSN-NEEEEMWTPKQSQTKEEHKXMBDCYBBACBYYCBSRHM 118 H T + +++C+++++B GS+ + E++M T Q Q + K ++++++++CBY++BSRHM Sbjct: 62 HXT-NKNLACACYYBB--GSTQTNEFHMQTKDQPQSRTHKKHGDCAAYCDCBYABBSRHM 118
Query: 119 PHFHHK---DYDCBBDAACDSRNFFFIBGRFEFNQTBBDCAACYCACDQPDPK 168 P+F+HK +Y+++B++++DS++++ F++NQ+B++++A++C+++
Sbjct: 119 PEFEHKTTLQPDCACYCYCBYBLCDIDSQTKEV--PHFHKNQPBDBYCAABCYACBSQHE 176
Query: 169 FVPFEFMPBABGDCBDCBCBBSRTKVILZPHEEFMPNKLQMBQTLSTFXNKHFXMBBYYC 228 FVPF+F++ +++DC+++B++BS+TKVILZP+EE+MPNKLQMBQTLS+F N +F M++++C Sbjct: 177 FVPFHFVTIDDBDCYBDBDDBSQTKVILZPEEEKMPNKLQMBQTLSNFNNTEFHMACADC 236
Query: 229 DCA-BDBCDDLISHHXTHFEEEFFHKEFHTQM 259 ++A ++++++ HX EE+ +KEFHT + Sbjct: 237 AAAYCAAYCGSRTRHXSK--EEKNEKEFHTLG 266
(B)
Chain 1: d1dbqa_.ent:A (Size=128) Chain 2: d1tlfa_.ent:A (Size=296)
Alignment length = 115 Rmsd = 2.87A Z-Score = 5.9 Gaps = 4(3.5%) CPU = 0s Sequence identities = 15.7%
Chain 1: 1 KSIGLLATSSEAAYFAEIIEAVEKNCFQKGYTLILGNAW-NNLEKQRAYLSMMAQKRVDGLLVMCSEYPE Chain 2: 2 LLIGVATSSLALHAPSQIVAAIKSRADQLGASVVVSMVERSGVEACKAAVHNLLAQRVSGLIINYPLDDQ Chain 1: 70 PLLAMLEEYRHIPMVVMDWGEAKADFTDAVID--NAFEGGYMAGRYLIE
Chain 2: 72 DAIAVEAACTNVPALFLDVSDQT-PINSIIFSHEDGTRLGVEHLVALGH
(C)
(D)
Figure 28. The results of FASTA, PiSA-BLAST and CE align with related domains: query protein “d1dbqa_” and subject protein “d1tlfa_”. (A) The sequence alignments in original amino acid by FASTA, (B) database searching with encoded sequence by PiSA-BLAST, (C) structural alignment and (D) the conformation of d1dbqa_ (blue) and d1tlfa_ (red) by CE. The sequence identity is 25.8%. The e-value in alignment of PiSA-BLAST is 5*10-69. The Z-score of CE result is 5.9 and the conformation between query protein and subject protein is similar.
d1cjwa_ 166 aa vs. d1cm0a_ 162 aa
Identities = 49/154 (31%), Positives = 101/154 (65%), Gaps = 27/154 (17%) Query: 5 EHNTHKLGQM---YYYBDBGLILIBDLSGPKVPKLYBYBDCALLRQGGP--EFHEVWQTH 59
Identities = 49/154 (31%), Positives = 101/154 (65%), Gaps = 27/154 (17%) Query: 5 EHNTHKLGQM---YYYBDBGLILIBDLSGPKVPKLYBYBDCALLRQGGP--EFHEVWQTH 59