Case study - Foxk1 - Web server - 利用機器學習演算法篩選適當模板結構提升預測轉錄因子結合序列特徵之準確度

Chapter 5. Web server

5.3 Case study - Foxk1

This case illustrates how to use DBD2BS to infer the binding PWM of mouse Foxk1

(Forkhead box protein K1) using one of the unbound structures of its DNA-binding

domain as well as how to elucidate DBD2BS’s output via the utilities embedded in

DBD2BS.

The query (PDB ID: 2D2W:A) was submitted to DBD2BS with the ‘Query with a

protein structure’ form. Because 2D2W is a NMR structure, DBD2BS selects the first

MODEL by default. After a while, the user will see the candidate templates sorted by

their TM-scores (namely structure similarities) against the query structure (Figure 15).

In this case, the top three templates suggested by DBD2BS are used. The results are

shown in Figure 15.

On the result page, the selected templates with their PWM and other detailed

information are listed in the left area (marked as “1” in Figure 16). Clicking the “3D”

icon of each PWM in “1” loads the superimposed complex in the 3D view. The loaded

complex contains the protein-DNA template and the superimposed query to help users

observe the query-DNA interactions and the conformational changes between the

unbound state (the query) and the bound state (the protein in the template) (“3” in

Figure 16). DBDs are displayed as sticks. Residues within 1.5 Angstroms to any heavy

atom of the DNA are colored red when the option “Atom collision” (“2” in Figure 16)

is turned on. DNA base pairs are colored based on their conservation level. The 5’ end

of the PWM (position ‘1’ in the PWM) in the Jmol panel is highlighted by showing the

corresponding base in green so that users can quickly link the PWM with the DNA in

the Jmol panel.

Clicking the “open” button of each template in “1” shows more detail about the

template and the prediction made by using it, including the files ready for downloading

as follows:

 superimposed complex: the protein-DNA complex structure that DBD2BS

generates from the query,

 template complex: the protein-DNA complex structure whose protein is similar to

the query protein,

 alignment: the result reported by the structure alignment tool

 contact residue: the contact residues of the query protein in the superimposed

complex

 PWM: the binding profile generated by DBD2BS.

Figure 15 Snapshot of the template select page.

Figure 16 Snapshot of the result page.

The “3D” icon of the first PWM (template ID 2C6Y:B) in “1” was clicked and the option “Atom collision” in “2” was turned on.

Users can click the “CMP” icon of each PWM to see whether the predicted PWM is

supported by the predicted PWMs from other templates. In this case, the “CMP” icon

of the first PWM (template ID 2C6Y:B) was clicked (Figure 16), owing to the lowest

E-value. On the comparison page, the PWM of the selected template is highlighted as

the reference PWM. The alignments of the reference PWM against the other PWMs

were performed as follows. The reference complex—the superimposed complex of the

query protein and the template complex corresponding to the reference PWM—is first

aligned to the other complexes by superimposing the query protein inside them. After

1 2

superimposition, the DNA structures from two complexes are structurally aligned via

dynamic programming. Base pairs from different complexes are aligned if they are

closer than 2Å. This may result in discontinuous alignment of two sequence logos.

Figure 17 shows that the unaligned positions were trimmed. Comparing the PWMs

from different templates show which positions in the predicted PWM have higher

confidence when consistent predictions are observed. In this tutorial, the prediction

based on 2C6Y:B was consistent with that based on 2C6Y:A, 2AS5:F and 3G73:A on

four positions (xAxACA) (at the position ‘9’, ‘12’-‘16’ on 2C6Y:B). Further

observation of the first and third positions of trimming PWM on 2C6Y:B (position ‘9’

and ‘13’ of the original PWM) shows that the first position, ‘G’, aligns to ‘T’ on the

others templates and the third position, ‘T’, aligns to ‘A’ on the other templates. To

confirm this situation, the user can click the “CMP” icon of the others PWM. For

example, click the “CMP” icon of the second PWM (template ID 2C6Y:A) and the

comparison page shows that the third position of trimming PWM on 2C6Y:A (position

‘8’ of the original PWM), ‘A’, is aligns to ‘T’ on 2CY6:B, but aligns to ‘A’ on the

others templates. By this observation, we have higher confidence in changing ‘T’ to ‘A’

at third position on 2C6Y:B. Thus, we get the consensus “AAACA”.

In addition, Figure 16 shows that the collisions, which indicate large conformational

change, happen near position ‘11’ of the PWM (green base in Jmol mean the position

‘1’ of the 5’ end of the PWM) and position ‘3’, ‘4’ and ‘5’ of the other DNA chain on

2C6Y:B. For the same reason, the collisions happened near position ‘6’ (‘7’, ‘8’ and ‘9’)

on 2C6Y:A, position ‘13’, ‘14’ and ‘15’ (‘7’) on 2AS5:F and position ‘11’ (‘8’-’11’) on

3G73:A. In this case, the positions near collisions (position ‘11’) are different from the

positions predicted by the DBD2BS (position ‘9’, ‘12’-‘16’) on 2C6Y:B. Thus, the

users can ignore that the collision happened.

To further analyze the accuracy of the consistent positions of the PWMs based on the

above, they were aligned to the annotated PWM (Figure 18). According to the

annotated PWM (the first row of Figure 18) obtained from [37], the consensus

“AAACA” achieves five correct bases among the five positions in the annotated motif.

Furthermore, the similarity and complete-similarity between the annotated PWM and

consensus are achieved with 0.95 and 0.47 respective. This demonstrates the

DBD2BS’s success in predicting the most important positions (the largest characters in

the sequence logo) of annotated PWM (Figure 18) by comparing similar templates.

The DBD2BS is also able to provide useful information based on the unbound

structure of the query DBD along with a bound structure from homologues.

Figure 17 Snapshot of the comparison page.

Annotation:

consensus

Figure 18 Alignment of the annotated and predicted consensus by DBD2BS for the mouse Forkhead box protein K1 (Foxk1).

Chapter 6. Conclusion and suggestion

在文檔中利用機器學習演算法篩選適當模板結構提升預測轉錄因子結合序列特徵之準確度 (頁 80-87)