Evolutionary analysis of the two-component systems in Pseudomonas aeruginosa PAO1

2.1 Introduction

The two component system (2CS) is the means by which bacteria commonly

regulate an adaptive response to versatile environments. A 2CS often comprises of a

sensor histidine kinase and a response regulator (Stock et al. 1990). The sensor kinase

consists of at least one signal recognition (input) domain coupled to an autokinase

(transmitter) domain. Signals binding to the input domain cause activation of the

autokinase and thereby, hydrolysis of an ATP molecule to phosphorylate a conserved

histidine residue (Stock et al. 1989). The phosphate group is subsequently transferred

to the conserved aspartate residue at the receiver domain of a response regulator.

Most sensor kinases contain one input domain and one transmitter domain and are

hence called classical (IT-type) sensors. Some sensors contain both the sensor kinase

signature and a receiver domain of the response regulator and are thus referred to as

hybrid or ITR-type sensory kinases (Ishige et al. 1994). A smaller fraction of the

hybrid sensors possesses an additional output domain at the carboxyl terminus and are

referred to as ITRO-type or unorthodox sensor kinases. The response regulator, in

most cases, is a transcription factor for genes whose expressions correspond to the

modulate an appropriate expression of the target gene (Parkinson 1993). Response

regulators other than transcription factors have also been reported. For example, the

CheY response regulator, after being phosphorylated by the kinase CheA, binds to a

flagella motor to promote a clockwise rotation of the flagella (Macnab, 1996).

The tight connection between the functionally coupled bacterial genes and their

chromosomal vicinity is a common feature of bacterial genomes (Overbeek et al.

1999; Dandekar et al. 1998). Most of the 2CS genes encoding functionally coupled

sensors and regulators are also physically linked as an operon in the genomes. Two

models, the co-evolution and recruitment models have been proposed to explain the

evolution of 2CS genes. The co-evolution model proposes that the majority of the

2CS genes in a genome have been aroused by gene duplication and a subsequent

differentiation of the ancestral 2CSs (Koretke et al. 2000). This is supported by the

fact that many of the coupled 2CS genes are concurrent in a genome. On the other

hand, the recruitment model suggests that some of the 2CS operons have evolved as a

result of an assembly of a sensor gene and a regulator gene from heterologous 2CSs.

Signal transductions between 2CSs encoded by distantly located genes have been

reported in the sporulation system of B. subtilis (Kobayashi et al., 1995) and E. coli

hybrid sensor kinases respectively (Hoch and Sihavy, 1995). It is conceivable that, in

the recruitment model, further assembly of such distantly located 2CS genes into an

operon would be beneficial for a coordinate control of the system.

Pseudomonas aeruginosa is a flexible Gram-negative bacterium that grows in a

variety of environmental habitats. Patients with cystic fibrosis, burn victims, and

patients requiring extensive hospitalization are particularly at risk of P. aeruginosa

infections (Goldberg et al. 2000). The complete genome sequence of P. aeruginosa

PAO1 has been determined and published (Stover et al. 2000). The 6.3-Mb genome

contains 5,570 predicted genes, of which 123 2CSs were annotated according to the

most recently updated database of the Pseudomonas Genome Project. The number of

2CS genes in P. aeruginosa genome is relatively high in comparison with that in the E.

coli and Bacillus genomes, which is likely to be advantageous for the bacteria to adapt

to different environments. Nevertheless, the function of approximately two-thirds of

the 2CS genes has not been characterized. In this study, we have performed analyses

of the phylogenetic relationship of the 2CS genes in P. aeruginosa PAO1 in the hope

that it might reveal some implication of their functions.

2.2 Materials and Methods

Nucleotide Sequence Source and Sequence Analysis

The known and putative 2CS genes annotated by Pseudomonas aeruginosa

Community Annotation Project (PseudoCAP) were obtained from the web site

http://www.pseudomonas.com. The sequences of sensor kinase genes, hybrid sensor

genes, and response regulator genes were collected and processed into FASTA format.

Analysis of the 2CS was performed by homology search using the BLAST programs

provided by the National Center of Biotechnology Information through the Internet.

Multiple Sequence Alignment and Phylogenetic Estimation

Neighbor-Joining (NJ) trees built with the deduced amino acid sequences for

sensor kinases and response regulators were done by CLUSTAL W 1.81 (Tompson et

al. 1994). Default substitution matrix (Gonnet) was used for alignments, and the

positions with gaps were excluded in the tree construction. The resultant trees were

visualized by TreeView 1.6 (Page, 1996) and MEGA2 (Kumar et al. 2001).

For the maximum likelihood analysis, multiple sequence alignments of the amino

acid sequences of sensors and regulators from homologous gene clusters were

performed, also using CLUSTAL W. The positions containing alignment gaps were

subsequently excluded manually using BioEdit 4.8.6 (Hall, 1999). Pair-wise distances

were analyzed by the PROML algorithm (with JTT amino acid change model) in

PHYLIP 3.6 (Felsenstein, 1993) and 1000 replications of bootstrap sampling were

performed for each analysis. Graphical representations of the multiple amino acid

sequence alignments, the sequence logos, are presented using WebLogo (Crooks et al.,

2004).

GC%, G+C Content in the 3rd Position of Synonymous Codons (GC3s), and The

Effective Number of Codons Used in a Gene (Nc)

The GC% and GC% in the 3^rd position of synonymous codons (GC3s) were

calculated using CodonW (Peden 1999). Nc, the measure of overall codon bias in a

gene, was calculated by using the CHIPS program with Wright's Nc statistic for an

effective number of the codons used (Wright 1990).

2.3 Results and Discussions

Organization of 2CS encoding gene clusters

The 123 annotated 2CSs in the P. aeruginosa PAO1 genome, including 64 sensor

and 59 regulator genes, were chosen for this study. The discrepancy in the numbers of

sensor kinases and regulator genes as compared to the earlier reports of 64 and 63

sensor kinases and regulator genes respectively (Rodrigue et al. 2000) is most likely

due to the recent refinement of the annotation by the Pseudomonas Genome Project.

All these 2CS genes were first classified according to their relative location, gene

organization, and transcription orientation. As shown in Table 1, each sensor gene

was found to be located adjacent to a regulator gene by either direct linkage or

separated by less than 3 open reading frames (ORFs) except for 14 sensor genes

which were assigned as orphan sensors in Group IV. The most common type of gene

organization as represented by the 29 2CS gene pairs in Group I is that the regulator

gene was located upstream to the sensor gene. Two 2CS gene clusters within this

group contained an additional 2CS gene (a sensor gene in Group Ib, and a regulator

gene in Group Ic), which is transcribed in an opposite direction to that of the paired

regulator and sensor genes. The Group Id contains 4 gene clusters with one or three

non-2CS ORFs located in between the regulator and the sensor genes respectively.

Group II contains 16 pairs of 2CS genes with the gene order of sensor followed by

regulator. There are four 2CS gene clusters in Group III, where the regulator and the

sensor genes are transcribed divergently. The rest of the 2CS genes, including 14

sensors and 8 regulator genes, are not physically linked to any 2CS gene and were

hence referred to as orphan sensors and regulators respectively.

Analysis of the 2CS genes based on functional motifs

These 2CS genes were further analyzed on the basis of the functional motifs of

their gene products. The average length of the response regulator genes was

approximately 850 bp. Twenty four of the regulators are members of the OmpR

transcription factor family, which forms the largest group of the 2CS response

regulators in P. aeruginosa PAO1. Apart from the 11 NarL-, 8 NtrC- and 5

CheY-type regulators, the rest of the 11 regulators with signal-receiving motifs, were

found not to contain the conserved C-domain for classification and therefore were

listed as unclassified (Table 1).

In contrast to that of the regulator genes, the size of the sensor genes varies greatly,

ranging from 650 bp to 7418 bp. Classification of the 64 sensor genes was as follows

- 42 IT (classic), 12 ITR (hybrid), 5 ITRO (unorthodox), and 4 CheA-type based on

PA0471, which encodes the iron sensor Fur, is the only unclassified sensor gene

(Table 1).

Combining the analysis of gene organization and the structural motifs of these

gene products, several interesting features were noted as follows:

(1) Almost all (20 out of 23) of the Group Ia gene clusters carry an OmpR-like

regulator, and most of the OmpR-like regulators (22 out of 24) have accompanying

classical (IT) sensor genes located adjacently downstream. This observation indicates

that the gene order of a regulator-to-classical sensor is preferred by the family of

OmpR-like regulators. It is likely that most of the regulator-sensor pairs in Group Ia

were co-evolved by duplication from an ancestral OmpR-IT pair so that the gene

organization remained unchanged. Moreover, 15 out of the 20 OmpR-IT 2CS gene

clusters consist of the regulator gene overlapped with the downstream sensor gene

which also supports the co-evolution model. In the E. coli K12 MG1655 genome, 11

of the 14 2CSs of the OmpR-like family exert the gene order of regulator-to-sensor

according to the KEGG database (Kanehisa et al., 2002). A similar phenomenon is

also observed in the genome of Bacillus subtilis, where all of the 14 2CS genes of the

OmpR-family are in a regulator-to-sensor organization (Fabret et al., 1999). It is

likely that most of the 2CS gene clusters of the OmpR-family have originated from a

common progenitor before the speciation of the proteobacteria and Gram-positive

bacteria.

(2) The NarL-like regulator genes classified in either of the Group I, II, III or IV

appeared to link to the corresponding genes of either IT-, ITR-, or ITRO-type sensors

suggesting a different strategy from co-evolution. Instead, they are probably recruited

components during evolution.

(3) 10 out of 11 ITR-type hybrid sensors are orphans. An exception is PA1396,

which is located next to a NarL-like regulator PA1397 in a divergently transcription

orientation. The ITR-type sensor, also referred to as the hybrid sensor kinase, contains

a regulator-like receiver domain following the input and transmitter domains.

Interestingly, most of the ITRO-type sensors, which carried an additional Hpt

(histidine-containing phosphotransfer) domain in comparison with that of the

ITR-type sensors, are adjacent to a response regulator. It has been demonstrated that

the phosphorelay specificity of the ITRO-type sensors, such as the BvgS of Bordetella

pertussis and EvgS of E. coli, was determined by the Hpt domain (Perraud et al.,

1998). The phosphorelay between the ITR-type kinases and the corresponding

response regulators in Vibrio harveyi and Saccharomyces cerevisiae also occurred

through Hpt modules, which are however encoded by genes distantly located to the

2CS genes (Freeman et al., 1999; Posas et al., 1996). In the P. aeruginosa PAO1

2000). It is likely that without a combined Hpt domain, the ITR sensors act in concert

with the Hpt modules to perform the multiple-step phosphorelay in a manner similar

to that of the ITRO systems. It has been proposed that a receiver or receiver-Hpt

domain may be fused to an IT-type kinase to yield hybrid or unorthodox sensor

kinases (Grebe and Stock, 1999). Recruitment of these domains may confer on these

systems an additional flexibility as compared to the classical two-component signal

transduction.

Phylogenetic analysis of the 2CS genes

The evolutionary relationships among the sensor and regulator genes were

estimated by multiple sequence alignment of their deduced amino acid sequences

using CLUSTAL W followed by the neighbor-joining method of tree construction.

The 2CSs including four CheA-type sensors (PA0178, PA0413, PA1458, PA3704), 5

sensors (PA1396, PA3078, PA3878, PA4197, PA5262, PA0471) with extraordinary

length, and one unclassified regulator PA4843 were poorly aligned with the rest of

sensors and thus were excluded from the NJ tree construction.

As shown in Figure 8A, the ITR- and ITRO-type sensors apparently formed a

group of sub-trees except for the branches h1 and h2. Their close association in the

tree suggests that both ITR- and ITRO-type sensors share a common ancestor.

However, only 4 ITRO- and 1 ITR-type sensors were observed in the 49

sensor-regulator pairs. The question of why the multi-step phosphorelaying system is

not favored in the bacteria remains to be answered.

Most OmpR-like and NtrC-like regulator encoding genes were found to form a

cluster in the tree (Figure 8B), whereas genes encoding the members in the NarL

family and the unclassified regulators were scattered in the tree indicating a lower

sequence similarity. In order to analyze the historical associations between sensors

and regulators, each node in the sensor tree was first assigned an association with the

cognate regulator. Each terminal node of the sensor tree was therefore represented by

its cognate regulator, while each of the internal nodes was represented by the union of

the cognate regulators of all the descendants of that node. For each node in the sensor

tree, the corresponding node in the regulator tree with the same descendent regulators

was searched for. Subsequent to this, six clades (congruent monophyletic groups)

composed of 11 sensor-regulator pairs designated as clade A to F respectively were

identified (Figure 8A, 8B). As shown in Table 2, the distances of the 2CS pairs in

each clade calculated based on PRODIST were apparently shorter in contrast to those

of the 2CS pairs containing NarL-like regulators. The 2CS pairs in each of the clades

appeared to be the most closely related and most likely to be derived from a recent

co-evolution of the 2CSs in 20 different genomes (Koretke et al., 2000).

In order to further assess the co-evolution relationships, maximum-likelihood (ML)

estimation of the phylogeny was subsequently carried out for the specific groups of

OmpR-, NtrC-, and NarL-like regulator-containing 2CSs (Figure 9). As shown in

Figure 9A and 9B, the 2CS pairs of OmpR and NtrC families also form congruent

clades whereas, the ML trees for the 2CS pairs of NarL-like regulators appeared to

show different topologies. In Figure 9C, the PA1397 in the regulator tree is only one

branch away from PA3879 (narL), however, their corresponding sensors PA1396 and

PA3878 (narX) are distantly located from each other. This is supportive to the

recruitment model that the 2CS pairs are assembly products of a sensor and regulator.

It is consistent with the distance-based analysis that the distances between these

NarL-group 2CS pairs are relatively long or beyond determination (Table 2).

To measure the dissimilarities between the sensor and regulator trees, the resolved

and different quartets were determined (Estabrook et al., 1985). For trees of the 23

sensor and regulator pairs of the OmpR- group, the quartet dissimilarity is 4,251. For

the 8 sensor - regulator pairs of NarL- and NtrC-groups, the values are 41 and 31

respectively. To compare the tree dissimilarities of different groups, random trees

with the same number of OTUs are generated and thence, the dissimilarities measured.

The Figure 10 shows the measurement of tree dissimilarities from the set of random

trees. Fewer than 3.57% and 5.65% of the random trees have smaller quartet

dissimilarties than the data of the OmpR- and NtrC- groups respectively. In contrast,

fewer than 19.49% of the random trees have smaller quartet dissimilarities than that

calculated for the NarL-group (Figure 10). The results conclusively show that the tree

congruencies of the OmpR- and NtrC- group are relatively higher than that of the

NarL- group.

Analysis of the sequences around the phosphorelated histidine of the sensor kinases

It has been shown in the B. subtilis 2CSs that, classification of the sequences

flanking the histidine in the kinase could be correlated to their cognate regulators

(OmpR, NarL, etc.) (Fabret et al., 1999). The sequences around the phosphorylated

histidine residue were used to further classify the sensor kinases. We have found that

the histidine containing motifs could be classified in to three homologous groups and

their sequence logos are as shown (Figure 11). The 4 CheY-type regulators were

paired with the Class I kinases. Seven out of nine NtrC-type regulators were paired

with the Class II kinases. The two exceptions were PA5484 with a regulator-to-sensor

transcription order, and PA4293 with two ORFs located in between the regulator and

its cognate sensor. Most interstingly, all the OmpR-type regulators were paired with

were paired with the Class II and III sensors, most of the others were paired with the

sensors showing low sequence similarity around the histidine residue. This is

supporting evidence to the hypothesis that sensors paired with their cognate regulators

of the NtrC- or OmpR-types have co-evolved as a unit from a common ancestor.

Functional analysis of the most recently duplicated 2CS sensor-regulator pairs

In order to assess functions of the 2CSs identified in each of the congruent clades,

we compared these 2CSs with those of the known functions identified in other species

and also their adjacent genes with the characterized properties. Several interesting

findings were noted:

(1) The 2CS genes in one clade may contain a similar function. For instance, in

clade A, the two 2CS gene pairs pirS (PA0930)/pirR (PA0929) and pfeS

(PA2687)/pfeR (PA2686) are parts of the operons pirRSA and pfeRSA, respectively

(Figure 12A). Both operons encode siderophore-mediated iron uptake systems and are

under the control of the Fur protein (Ochsner and Vasil, 1996). This indicates that the

paralogous groups continue to carry out a similar function after gene duplication.

Functional redundancy is also seen in the members of clade C: PA3044/PA3045 and

PA3946/PA3948, which are likely virulence-related 2CS paralogs (Figure 12B). Both

the 2CS gene pairs exhibit significant sequence homology with those of Bordetella

parapertussis bvgAS that has been demonstrated to participate in regulating the

synthesis of many virulence factors (Bock and Gross 2001). Moreover, the regulator

gene PA3947 has been reported to encode a homologue with a 45% sequence

similarity to Vibrio cholerae virulence-related protein VieA (Lee et al., 1998) and to

the regulator PvrR that controls antibiotic susceptibility and biofilm formation in P.

aeruginosa PA14 (Drenkard and Ausubel 2002). The clustering structure suggests a

related function since gene clusters in a bacterial genome may possess the same

function (Overbeek et al., 1999).

(2) Gene rearrangement may have occurred after duplication of the co-evolved

2CS gene cluster. The virulence-associated 2CS gene PA0928 (lemA) is adjacent to

the pirRSA operon (Figure 12A). In the Pseudomonas syringae genome, the lemA

gene is clustered with the cysteine synthase encoding gene cysM (PA0932) in a

divergently transcriptional direction suggesting that the region has been subject to

gene rearrangement during speciation of P. syringae and P. aeruginosa. PA0928,

which resides relatively distant to the sensor PA0930 in the tree, appears to be

recruited later by the 2CS pair PA0929/0930. As shown in Figure 12B, the 2CS pair

PA3045 and PA3044 in clade C are co-transcribed. However, the PA3946 and

PA3948 in this clade are transcribed divergently, which is probably indicative that the

duplication.

(3) A group of the functionally related 2CSs are probably required in controlling

the translocation of metabolites and ions. As shown in Figure 13A, all three gene pairs

(PA1335/PA1336, PA5165/PA5166, and PA5511/PA5512) in clade D appear to be

homologs of Rhizobium meliloti DctBD, which controls the transportation of

C4-dicarboxylic acids in R. meliloti (Wang et al. 1989). The downstream genes

clustered with PA5165/PA5166 are homologs of DctPQM (67%, 47% and 72%

sequence similarities) that are essential for transportation of the C4-dicarboxylates in

Rhodobacter capsulatus (Shaw et al., 1991). Several genes encoding homologs of

glutaminase-asparaginase (88% sequence similarity) of Pseudomonas 7A

(Holcenberg et al., 1997), E. coli glutamyltranspeptidase (62% sequence similarity)

(Suzuki et al., 1988), and E. coli glutamate-aspartate ABC transporters (> 68%

sequence similarities) (Oshima et al., 1996) respectively were found upstream of the

PA1335/PA1336. Moreover, a putative S. typhimurium amino acid

permease-encoding gene was found adjacent to PA5511/PA5512. The regulator of the

four 2CS gene pairs in clade E showed significant similarities (>74% sequence

similarities) to the transcriptional regulator IrlR of Burkholderia pseudomallei, (Jones

在文檔中細菌致病因子的基因體學分析 (頁 37-57)