2.1 Introduction
The two component system (2CS) is the means by which bacteria commonly
regulate an adaptive response to versatile environments. A 2CS often comprises of a
sensor histidine kinase and a response regulator (Stock et al. 1990). The sensor kinase
consists of at least one signal recognition (input) domain coupled to an autokinase
(transmitter) domain. Signals binding to the input domain cause activation of the
autokinase and thereby, hydrolysis of an ATP molecule to phosphorylate a conserved
histidine residue (Stock et al. 1989). The phosphate group is subsequently transferred
to the conserved aspartate residue at the receiver domain of a response regulator.
Most sensor kinases contain one input domain and one transmitter domain and are
hence called classical (IT-type) sensors. Some sensors contain both the sensor kinase
signature and a receiver domain of the response regulator and are thus referred to as
hybrid or ITR-type sensory kinases (Ishige et al. 1994). A smaller fraction of the
hybrid sensors possesses an additional output domain at the carboxyl terminus and are
referred to as ITRO-type or unorthodox sensor kinases. The response regulator, in
most cases, is a transcription factor for genes whose expressions correspond to the
modulate an appropriate expression of the target gene (Parkinson 1993). Response
regulators other than transcription factors have also been reported. For example, the
CheY response regulator, after being phosphorylated by the kinase CheA, binds to a
flagella motor to promote a clockwise rotation of the flagella (Macnab, 1996).
The tight connection between the functionally coupled bacterial genes and their
chromosomal vicinity is a common feature of bacterial genomes (Overbeek et al.
1999; Dandekar et al. 1998). Most of the 2CS genes encoding functionally coupled
sensors and regulators are also physically linked as an operon in the genomes. Two
models, the co-evolution and recruitment models have been proposed to explain the
evolution of 2CS genes. The co-evolution model proposes that the majority of the
2CS genes in a genome have been aroused by gene duplication and a subsequent
differentiation of the ancestral 2CSs (Koretke et al. 2000). This is supported by the
fact that many of the coupled 2CS genes are concurrent in a genome. On the other
hand, the recruitment model suggests that some of the 2CS operons have evolved as a
result of an assembly of a sensor gene and a regulator gene from heterologous 2CSs.
Signal transductions between 2CSs encoded by distantly located genes have been
reported in the sporulation system of B. subtilis (Kobayashi et al., 1995) and E. coli
hybrid sensor kinases respectively (Hoch and Sihavy, 1995). It is conceivable that, in
the recruitment model, further assembly of such distantly located 2CS genes into an
operon would be beneficial for a coordinate control of the system.
Pseudomonas aeruginosa is a flexible Gram-negative bacterium that grows in a
variety of environmental habitats. Patients with cystic fibrosis, burn victims, and
patients requiring extensive hospitalization are particularly at risk of P. aeruginosa
infections (Goldberg et al. 2000). The complete genome sequence of P. aeruginosa
PAO1 has been determined and published (Stover et al. 2000). The 6.3-Mb genome
contains 5,570 predicted genes, of which 123 2CSs were annotated according to the
most recently updated database of the Pseudomonas Genome Project. The number of
2CS genes in P. aeruginosa genome is relatively high in comparison with that in the E.
coli and Bacillus genomes, which is likely to be advantageous for the bacteria to adapt
to different environments. Nevertheless, the function of approximately two-thirds of
the 2CS genes has not been characterized. In this study, we have performed analyses
of the phylogenetic relationship of the 2CS genes in P. aeruginosa PAO1 in the hope
that it might reveal some implication of their functions.
2.2 Materials and Methods
Nucleotide Sequence Source and Sequence Analysis
The known and putative 2CS genes annotated by Pseudomonas aeruginosa
Community Annotation Project (PseudoCAP) were obtained from the web site
http://www.pseudomonas.com. The sequences of sensor kinase genes, hybrid sensor
genes, and response regulator genes were collected and processed into FASTA format.
Analysis of the 2CS was performed by homology search using the BLAST programs
provided by the National Center of Biotechnology Information through the Internet.
Multiple Sequence Alignment and Phylogenetic Estimation
Neighbor-Joining (NJ) trees built with the deduced amino acid sequences for
sensor kinases and response regulators were done by CLUSTAL W 1.81 (Tompson et
al. 1994). Default substitution matrix (Gonnet) was used for alignments, and the
positions with gaps were excluded in the tree construction. The resultant trees were
visualized by TreeView 1.6 (Page, 1996) and MEGA2 (Kumar et al. 2001).
For the maximum likelihood analysis, multiple sequence alignments of the amino
acid sequences of sensors and regulators from homologous gene clusters were
performed, also using CLUSTAL W. The positions containing alignment gaps were
subsequently excluded manually using BioEdit 4.8.6 (Hall, 1999). Pair-wise distances
were analyzed by the PROML algorithm (with JTT amino acid change model) in
PHYLIP 3.6 (Felsenstein, 1993) and 1000 replications of bootstrap sampling were
performed for each analysis. Graphical representations of the multiple amino acid
sequence alignments, the sequence logos, are presented using WebLogo (Crooks et al.,
2004).
GC%, G+C Content in the 3rd Position of Synonymous Codons (GC3s), and The
Effective Number of Codons Used in a Gene (Nc)
The GC% and GC% in the 3rd position of synonymous codons (GC3s) were
calculated using CodonW (Peden 1999). Nc, the measure of overall codon bias in a
gene, was calculated by using the CHIPS program with Wright's Nc statistic for an
effective number of the codons used (Wright 1990).
2.3 Results and Discussions
Organization of 2CS encoding gene clusters
The 123 annotated 2CSs in the P. aeruginosa PAO1 genome, including 64 sensor
and 59 regulator genes, were chosen for this study. The discrepancy in the numbers of
sensor kinases and regulator genes as compared to the earlier reports of 64 and 63
sensor kinases and regulator genes respectively (Rodrigue et al. 2000) is most likely
due to the recent refinement of the annotation by the Pseudomonas Genome Project.
All these 2CS genes were first classified according to their relative location, gene
organization, and transcription orientation. As shown in Table 1, each sensor gene
was found to be located adjacent to a regulator gene by either direct linkage or
separated by less than 3 open reading frames (ORFs) except for 14 sensor genes
which were assigned as orphan sensors in Group IV. The most common type of gene
organization as represented by the 29 2CS gene pairs in Group I is that the regulator
gene was located upstream to the sensor gene. Two 2CS gene clusters within this
group contained an additional 2CS gene (a sensor gene in Group Ib, and a regulator
gene in Group Ic), which is transcribed in an opposite direction to that of the paired
regulator and sensor genes. The Group Id contains 4 gene clusters with one or three
non-2CS ORFs located in between the regulator and the sensor genes respectively.
Group II contains 16 pairs of 2CS genes with the gene order of sensor followed by
regulator. There are four 2CS gene clusters in Group III, where the regulator and the
sensor genes are transcribed divergently. The rest of the 2CS genes, including 14
sensors and 8 regulator genes, are not physically linked to any 2CS gene and were
hence referred to as orphan sensors and regulators respectively.
Analysis of the 2CS genes based on functional motifs
These 2CS genes were further analyzed on the basis of the functional motifs of
their gene products. The average length of the response regulator genes was
approximately 850 bp. Twenty four of the regulators are members of the OmpR
transcription factor family, which forms the largest group of the 2CS response
regulators in P. aeruginosa PAO1. Apart from the 11 NarL-, 8 NtrC- and 5
CheY-type regulators, the rest of the 11 regulators with signal-receiving motifs, were
found not to contain the conserved C-domain for classification and therefore were
listed as unclassified (Table 1).
In contrast to that of the regulator genes, the size of the sensor genes varies greatly,
ranging from 650 bp to 7418 bp. Classification of the 64 sensor genes was as follows
- 42 IT (classic), 12 ITR (hybrid), 5 ITRO (unorthodox), and 4 CheA-type based on
PA0471, which encodes the iron sensor Fur, is the only unclassified sensor gene
(Table 1).
Combining the analysis of gene organization and the structural motifs of these
gene products, several interesting features were noted as follows:
(1) Almost all (20 out of 23) of the Group Ia gene clusters carry an OmpR-like
regulator, and most of the OmpR-like regulators (22 out of 24) have accompanying
classical (IT) sensor genes located adjacently downstream. This observation indicates
that the gene order of a regulator-to-classical sensor is preferred by the family of
OmpR-like regulators. It is likely that most of the regulator-sensor pairs in Group Ia
were co-evolved by duplication from an ancestral OmpR-IT pair so that the gene
organization remained unchanged. Moreover, 15 out of the 20 OmpR-IT 2CS gene
clusters consist of the regulator gene overlapped with the downstream sensor gene
which also supports the co-evolution model. In the E. coli K12 MG1655 genome, 11
of the 14 2CSs of the OmpR-like family exert the gene order of regulator-to-sensor
according to the KEGG database (Kanehisa et al., 2002). A similar phenomenon is
also observed in the genome of Bacillus subtilis, where all of the 14 2CS genes of the
OmpR-family are in a regulator-to-sensor organization (Fabret et al., 1999). It is
likely that most of the 2CS gene clusters of the OmpR-family have originated from a
common progenitor before the speciation of the proteobacteria and Gram-positive
bacteria.
(2) The NarL-like regulator genes classified in either of the Group I, II, III or IV
appeared to link to the corresponding genes of either IT-, ITR-, or ITRO-type sensors
suggesting a different strategy from co-evolution. Instead, they are probably recruited
components during evolution.
(3) 10 out of 11 ITR-type hybrid sensors are orphans. An exception is PA1396,
which is located next to a NarL-like regulator PA1397 in a divergently transcription
orientation. The ITR-type sensor, also referred to as the hybrid sensor kinase, contains
a regulator-like receiver domain following the input and transmitter domains.
Interestingly, most of the ITRO-type sensors, which carried an additional Hpt
(histidine-containing phosphotransfer) domain in comparison with that of the
ITR-type sensors, are adjacent to a response regulator. It has been demonstrated that
the phosphorelay specificity of the ITRO-type sensors, such as the BvgS of Bordetella
pertussis and EvgS of E. coli, was determined by the Hpt domain (Perraud et al.,
1998). The phosphorelay between the ITR-type kinases and the corresponding
response regulators in Vibrio harveyi and Saccharomyces cerevisiae also occurred
through Hpt modules, which are however encoded by genes distantly located to the
2CS genes (Freeman et al., 1999; Posas et al., 1996). In the P. aeruginosa PAO1
2000). It is likely that without a combined Hpt domain, the ITR sensors act in concert
with the Hpt modules to perform the multiple-step phosphorelay in a manner similar
to that of the ITRO systems. It has been proposed that a receiver or receiver-Hpt
domain may be fused to an IT-type kinase to yield hybrid or unorthodox sensor
kinases (Grebe and Stock, 1999). Recruitment of these domains may confer on these
systems an additional flexibility as compared to the classical two-component signal
transduction.
Phylogenetic analysis of the 2CS genes
The evolutionary relationships among the sensor and regulator genes were
estimated by multiple sequence alignment of their deduced amino acid sequences
using CLUSTAL W followed by the neighbor-joining method of tree construction.
The 2CSs including four CheA-type sensors (PA0178, PA0413, PA1458, PA3704), 5
sensors (PA1396, PA3078, PA3878, PA4197, PA5262, PA0471) with extraordinary
length, and one unclassified regulator PA4843 were poorly aligned with the rest of
sensors and thus were excluded from the NJ tree construction.
As shown in Figure 8A, the ITR- and ITRO-type sensors apparently formed a
group of sub-trees except for the branches h1 and h2. Their close association in the
tree suggests that both ITR- and ITRO-type sensors share a common ancestor.
However, only 4 ITRO- and 1 ITR-type sensors were observed in the 49
sensor-regulator pairs. The question of why the multi-step phosphorelaying system is
not favored in the bacteria remains to be answered.
Most OmpR-like and NtrC-like regulator encoding genes were found to form a
cluster in the tree (Figure 8B), whereas genes encoding the members in the NarL
family and the unclassified regulators were scattered in the tree indicating a lower
sequence similarity. In order to analyze the historical associations between sensors
and regulators, each node in the sensor tree was first assigned an association with the
cognate regulator. Each terminal node of the sensor tree was therefore represented by
its cognate regulator, while each of the internal nodes was represented by the union of
the cognate regulators of all the descendants of that node. For each node in the sensor
tree, the corresponding node in the regulator tree with the same descendent regulators
was searched for. Subsequent to this, six clades (congruent monophyletic groups)
composed of 11 sensor-regulator pairs designated as clade A to F respectively were
identified (Figure 8A, 8B). As shown in Table 2, the distances of the 2CS pairs in
each clade calculated based on PRODIST were apparently shorter in contrast to those
of the 2CS pairs containing NarL-like regulators. The 2CS pairs in each of the clades
appeared to be the most closely related and most likely to be derived from a recent
co-evolution of the 2CSs in 20 different genomes (Koretke et al., 2000).
In order to further assess the co-evolution relationships, maximum-likelihood (ML)
estimation of the phylogeny was subsequently carried out for the specific groups of
OmpR-, NtrC-, and NarL-like regulator-containing 2CSs (Figure 9). As shown in
Figure 9A and 9B, the 2CS pairs of OmpR and NtrC families also form congruent
clades whereas, the ML trees for the 2CS pairs of NarL-like regulators appeared to
show different topologies. In Figure 9C, the PA1397 in the regulator tree is only one
branch away from PA3879 (narL), however, their corresponding sensors PA1396 and
PA3878 (narX) are distantly located from each other. This is supportive to the
recruitment model that the 2CS pairs are assembly products of a sensor and regulator.
It is consistent with the distance-based analysis that the distances between these
NarL-group 2CS pairs are relatively long or beyond determination (Table 2).
To measure the dissimilarities between the sensor and regulator trees, the resolved
and different quartets were determined (Estabrook et al., 1985). For trees of the 23
sensor and regulator pairs of the OmpR- group, the quartet dissimilarity is 4,251. For
the 8 sensor - regulator pairs of NarL- and NtrC-groups, the values are 41 and 31
respectively. To compare the tree dissimilarities of different groups, random trees
with the same number of OTUs are generated and thence, the dissimilarities measured.
The Figure 10 shows the measurement of tree dissimilarities from the set of random
trees. Fewer than 3.57% and 5.65% of the random trees have smaller quartet
dissimilarties than the data of the OmpR- and NtrC- groups respectively. In contrast,
fewer than 19.49% of the random trees have smaller quartet dissimilarities than that
calculated for the NarL-group (Figure 10). The results conclusively show that the tree
congruencies of the OmpR- and NtrC- group are relatively higher than that of the
NarL- group.
Analysis of the sequences around the phosphorelated histidine of the sensor kinases
It has been shown in the B. subtilis 2CSs that, classification of the sequences
flanking the histidine in the kinase could be correlated to their cognate regulators
(OmpR, NarL, etc.) (Fabret et al., 1999). The sequences around the phosphorylated
histidine residue were used to further classify the sensor kinases. We have found that
the histidine containing motifs could be classified in to three homologous groups and
their sequence logos are as shown (Figure 11). The 4 CheY-type regulators were
paired with the Class I kinases. Seven out of nine NtrC-type regulators were paired
with the Class II kinases. The two exceptions were PA5484 with a regulator-to-sensor
transcription order, and PA4293 with two ORFs located in between the regulator and
its cognate sensor. Most interstingly, all the OmpR-type regulators were paired with
were paired with the Class II and III sensors, most of the others were paired with the
sensors showing low sequence similarity around the histidine residue. This is
supporting evidence to the hypothesis that sensors paired with their cognate regulators
of the NtrC- or OmpR-types have co-evolved as a unit from a common ancestor.
Functional analysis of the most recently duplicated 2CS sensor-regulator pairs
In order to assess functions of the 2CSs identified in each of the congruent clades,
we compared these 2CSs with those of the known functions identified in other species
and also their adjacent genes with the characterized properties. Several interesting
findings were noted:
(1) The 2CS genes in one clade may contain a similar function. For instance, in
clade A, the two 2CS gene pairs pirS (PA0930)/pirR (PA0929) and pfeS
(PA2687)/pfeR (PA2686) are parts of the operons pirRSA and pfeRSA, respectively
(Figure 12A). Both operons encode siderophore-mediated iron uptake systems and are
under the control of the Fur protein (Ochsner and Vasil, 1996). This indicates that the
paralogous groups continue to carry out a similar function after gene duplication.
Functional redundancy is also seen in the members of clade C: PA3044/PA3045 and
PA3946/PA3948, which are likely virulence-related 2CS paralogs (Figure 12B). Both
the 2CS gene pairs exhibit significant sequence homology with those of Bordetella
parapertussis bvgAS that has been demonstrated to participate in regulating the
synthesis of many virulence factors (Bock and Gross 2001). Moreover, the regulator
gene PA3947 has been reported to encode a homologue with a 45% sequence
similarity to Vibrio cholerae virulence-related protein VieA (Lee et al., 1998) and to
the regulator PvrR that controls antibiotic susceptibility and biofilm formation in P.
aeruginosa PA14 (Drenkard and Ausubel 2002). The clustering structure suggests a
related function since gene clusters in a bacterial genome may possess the same
function (Overbeek et al., 1999).
(2) Gene rearrangement may have occurred after duplication of the co-evolved
2CS gene cluster. The virulence-associated 2CS gene PA0928 (lemA) is adjacent to
the pirRSA operon (Figure 12A). In the Pseudomonas syringae genome, the lemA
gene is clustered with the cysteine synthase encoding gene cysM (PA0932) in a
divergently transcriptional direction suggesting that the region has been subject to
gene rearrangement during speciation of P. syringae and P. aeruginosa. PA0928,
which resides relatively distant to the sensor PA0930 in the tree, appears to be
recruited later by the 2CS pair PA0929/0930. As shown in Figure 12B, the 2CS pair
PA3045 and PA3044 in clade C are co-transcribed. However, the PA3946 and
PA3948 in this clade are transcribed divergently, which is probably indicative that the
duplication.
(3) A group of the functionally related 2CSs are probably required in controlling
the translocation of metabolites and ions. As shown in Figure 13A, all three gene pairs
(PA1335/PA1336, PA5165/PA5166, and PA5511/PA5512) in clade D appear to be
homologs of Rhizobium meliloti DctBD, which controls the transportation of
C4-dicarboxylic acids in R. meliloti (Wang et al. 1989). The downstream genes
clustered with PA5165/PA5166 are homologs of DctPQM (67%, 47% and 72%
sequence similarities) that are essential for transportation of the C4-dicarboxylates in
Rhodobacter capsulatus (Shaw et al., 1991). Several genes encoding homologs of
glutaminase-asparaginase (88% sequence similarity) of Pseudomonas 7A
(Holcenberg et al., 1997), E. coli glutamyltranspeptidase (62% sequence similarity)
(Suzuki et al., 1988), and E. coli glutamate-aspartate ABC transporters (> 68%
sequence similarities) (Oshima et al., 1996) respectively were found upstream of the
PA1335/PA1336. Moreover, a putative S. typhimurium amino acid
permease-encoding gene was found adjacent to PA5511/PA5512. The regulator of the
four 2CS gene pairs in clade E showed significant similarities (>74% sequence
similarities) to the transcriptional regulator IrlR of Burkholderia pseudomallei, (Jones
similarities) to the transcriptional regulator IrlR of Burkholderia pseudomallei, (Jones