– An analysis of the OmpR-family
3.1 Introduction
Two-component signal transduction system (2CS) is commonly seen in the
regulation of adaptive responses in bacteria. This system often contains a sensor
histidine kinase and a response regulator protein. Signals activate the sensor kinase by
causing an autophosphorylation on the conserved histidine residue. The
phosphorylated sensor then transfers this phosphate group to a conserved aspartic acid
residue of the corresponding response regulator, which thereby modulates the
subsequent cellular responses (Hoch, 2000). Most of the sensor kinases consist of a
highly variable membrane sensor domain that connected to the conserved histidine
phosphotrasferase region with a cytoplasmic linker segment. The response regulator,
in most cases, consists of an amino-terminal receiver domain followed by a linker
region and a carboxyl-terminal DNA-binding domain (Fabret et al., 1999; Hoch,
2000).
Analysis of the 2CSs in Bacillus subtilis revealed that the classification of
regulator family could be correlated with the sequences surrounding the
phosphorylated histidine residue of the cognate kinase, which suggested that the
catalytic domain of kinase and DNA-binding domain of the response regulator
2CS genes in Enterococcus faecalis (Hancock and Perego, 2002). On the basis of
sequence homology in the DNA binding domain of the response regulators (Mizuno,
1997), two major families, OmpR and NarL, were identified in Pseudomonas
aeruginosa PAO1. We have shown that the gene order of regulator-to-sensor
transcription unit is preserved within the 2CSs of OmpR-family in the genome. The
conserved sequences surrounding the active-site histidine of the sensor kinases were
shown also correlated to the classification of their corresponding response regulators
(Chen et al., 2004), which implied that the protein interaction specificity and gene
organization were preserved in the evolution of the 2CSs of OmpR family.
The OmpR-like proteins are one of the most widespread transcriptional
regulators and constituted the largest family of the response regulators (Chang and
Stewart, 1998). The family contains at least 15 proteins with extensive amino acid
sequence similarities to that of OmpR (Mizuno and Tanaka, 1997), including PhoP,
PhoB, KdpE, ArcA, and CreB. The primary osmosensor EnvZ and the response
regulator OmpR are the prototype of the 2CS required for transducing an osmotic
stress signal in bacteria. Upon activation, OmpR functions as a transcriptional factor
to modulate the expression of ompF and ompC genes that encode porins for
controlling the cell membrane permeability (Roberts et al., 1994; Delgado et al.,
1993). Other members in the OmpR family appeared to be involved in a diverse
functions such as virulent activity (PhoQ/PhoP), phosphate regulation (PhoR/PhoB),
and anaerobic nitrite reduction (ResD/ResE) (Hoch and Silhavy, 1995).
Bacteria generally possess multiple 2CSs. To ensure a specific signal
transduction, it is conceivable that a preserved specificity of the interacting doamin is
present between the kinase and the corresponding response regulator. It is of our
interest to identify the constraint through evolution between the functional coupled
2CS components, the sensor kinase and the cognate regulator. We report herein an
extended analysis, concerning the gene order and the sequence conservation in the
interacting domain, of the 2CSs of OmpR-family from 38 species, chosen to represent
the diversity. Our results indicated that the specificity between the 2CS components
may have formed before assembly of the genes to from an operon.
3.2 Materials and Methods
Database searches
The completed genomic sequences of archaea (5 species), gamma-
proteobacteria (10 species), beta-proteobacteria (5 species), Chlorobi (1 species),
proteobacteria delta-epsilonsubdivision (1 species), alpha-proteobacteria (1 species),
Gram-positive bacteria (5 species), Actinobacterium (4 species), Deinococcus (1
species), Cyanobacteria (3 species), Aquificae (1 species), and Thermotogae (1
species) were collected to search for the proteins containing respectively the sensor
kinase transmitter domain, response regulator receiver domain, and the DNA-binding
domain of OmpR-type regulator. The HMM for the transmitter domain (HisKA.hmm),
receiver domain (Response_reg.hmm), and OmpR C-terminal (Trans _reg_C.hmm)
were obtained from the pfam database (http://www.sanger.ac.uk/Software/ Pfam/)
(Bateman et al. 2004). To look for these domains in the query sequences of total
ORFs from the collected genomes, HMMSEARCH (HMMER version 2.2) (Durbin et
al., 1998) was used.
The ORFs that contain both the receiver domain and OmpR C-terminal domain
were designated as “OmpR-type” regulators. The regulator encoding genes that
appeared to have accompanied ORFs nearby bearing a histidine kinase transmitter
domain were noted as “paired 2CS”. Alternatively, the regulator encoding genes
without the physically linked sensor gene were nominated as ‘orphan regulators’.
Multiple sequence analysis and estimation of phylogeny
The phylogeny of the selected organisms was produced based on
maximum-likelihood analysis of the 16S rRNA, which were obtained from the
European ribosomal RNA database (Wuyts et al., 2002). Sequences alignment were
performed by ClustalW (Thompson et al., 1997) and phylogenetic analyses were
performed using PAUP* 4.0 (Swofford, 1998). A maximum likelihood tree was
obtained based on a general time reversible (GTR) model. The tree was rooted with
the eukaryotic lineage of Arabidopsis and Yeast. The genome sequence and ORFs of
Klebsiella pneumoniae str. NTUH-K2044 was kindly provided by Dr. Shih-Feng Tsai
at NHRI, Taiwan (http://genome.nhri.org.tw/kp/).
Sequence logos
The amino acid sequences around the phosphorylated histidine residue of each
histidine kinases were identified after multiple sequence alignment, which was
performed using ClustalW with the BLOSUM62 (Henikoff and Henikoff 1992)
respectively according to the parameters used (Koretke et al., 2000). The sequences in
length of 14 amino acids were collected and graphical representations of the multiple
sequence alignments, the sequence logos, were presented using WebLogo (Crooks et
al., 2004).
3.3 Results and Discussions
Compilation of the 2CSs of the OmpR family
The ORFs from 38 species, including 5 archaea, and 33 eubacteria were collected
from GenBank. As shown in the maximum-likelihood analysis based on the 16S
rRNA sequences, the phylogeny built for the chosen species were placed in the clades
that correspond to the major taxons, which include the archaea, purple bacteria
(proteobacteria), Gram-positives, Actinomycetes, Cyanobacteria, Deinococcus,
Thermotogales (Figure 14). Corresponding to the canonical view of bacterial
taxonomy, such phylogeny indicated that the chosen species represent a wide
diversity of different archaea and eubacteria. We firstly searched the ORFs using
HMMER (Durbin et al., 1998) for the three domains including the transmitter domain
of sensor kinase, the receiver domain of response regulator, and the C-terminal
DNA-binding domain of OmpR response regulator. As shown in Table 3, a bacterium
that carries more HisKA (histidine kinases) generally contains more Receivers
(response regulators). Most of the closely-related species, for example, those of the
Salmonella genus, appeared to have similar number of 2CSs. Except the 5 archaeal
genomes, the other bacteria appeared to carry a number of OmpR-like regulators.
gene mostly in a regulator-to-sensor transcriptional orientation. Some of the
OmpR-like regulator genes do not have a linked kinase gene and hence named orphan
regulators (OP). A relatively small portion of the OmpR-like regulator encoding genes
revealed a different transcriptional orientation with their accompanied sensor genes.
Interestingly, more orphan OmpR-like regulators were found in the cyanobacteria,
including Synechococcus, Synechocystis and Prochlorococus.
Conventional view of the universal tree is that archaea and eukaryotes are sister
groups rooted in the bacteria, and the three were separated into monophyletic groups
(Brown and Doolittle, 1997). The genes encoding 2CSs were identified in all three
domains of life (Woese et al., 1990), however, mostly present in bacteria. Among
eukaryotes, only yeast, fungi, slime molds, and plants appeared to contain 2CSs. Only
a single histidine kinase and three response regulators were found in the completed
yeast Saccharomyces cerevisiae genome (Koretke et al., 2000). In our analysis, no
OmpR-like response regulator was found in the five archaea genomes, which
suggested that the OmpR 2CS genes were originated from a common ancestor in
bacteria thereby spread to most, but not all, of the bacteria species. These genes were
also passed to some of the eukaryotes, probably through horizontal gene transfer
events. Conservation of the interacting domain of the sensor kinases suggested that,
although co-evolution of the paired 2CS genes of the OmpR-faimly is the major
scheme, recruitment of the interacting components from distantly located sensor and
response regulator genes may still be important for a flexible regulation in bacteria.
Correlation of the 2CSs number with the genome size
As shown in Figure 15, our analysis indicated that the numbers of 2CS
components including sensor kinase, response regulators, and OmpR-like regulators
were positively correlated with their genome sizes. The P. aeruginosa genome
appeared to carry the largest number of 2CS genes among the chosen species. This
may explain the flexible habitats of the bacteria that, as opportunistic pathogens, they
are not only able to survive in various environments, but also adapt rapidly to the
changing host conditions (Rodrigue et al., 2000). The alpha proteobacteria
Rhodopseudomonas palustris, which is among the most metabolically versatile
bacteria capable of photoautotrophic, photoheterotrophic, chemoheterotrophic, or
chemoautotrophic growth (Larimer et al., 2004), also harbored a large number of 2CS
genes. The Streptomyces coelicolor A3, known as the most numerous and ubiquitous
soil bacteria (Bentley et al., 2002), likewise, carried more than hundred of 2CS genes.
This could be concluded that the relatively large number of the 2CS genes in a
genome allow the bacteria to grow in many different environments.
underestimation of the number, the numbers in Table 3 appeared to be smaller than
that of the annotation results in some of the genomes. Wheras, the numbers of
response regulator may be slightly overestimated since some of the 2CSs appear as a
hybrid in which a protein contains the transmitter domain of a sensor fused with a
response regulator receiver domain. However, the hybrid sensors only accommodate
for a minor portion of the sensor kinases, which should not result in significant bias to
the cross-species comparison.
Gene organization of the 2CS of OmpR-family
We have reported recently that, in the P. aeruginosa PAO1 genome, most of the
OmpR-type regulator-containing 2CS systems, as an operon, revealed the gene order
of regulator-to-kinase (RS) (Chen et al., 2004, in press). This phenomenon was also
seen in the analysis of the 2CSs in B. subtilis (Fabret et al., 1999). In the 38 genomes
analyzed, we have identified 325 ORFs that contained both a response regulator
receiver domain and an OmpR-like C-terminal domain. Among them, 258 (79.3%)
carried an accompanied sensor kinase in the same transcription direction, 50 appeared
to be orphan regulator genes, the other 14 showed different transcriptional directions
with the accompanied sensor genes. In the 258 regulator-sensor pairs, 220 (85.3%)
formed a gene order of regulator-to-sensor (RS). The majority of the less found
sensor-to-regulator (SR) 2CSs are homologs of kdpDE and baeSR (Polarek et al.,
1992; Baranova and Nikaido, 2002), which imply that the majority of these SR-type
2CSs probably also have a shared ancestry.
Analysis of sequences flanking the phosphorylated histidine residue of the sensor
kinases
Signal transduction between the sensor and regulator which are encoded by
distantly located genes has been reported in the sporulation system of B. subtilis and
the hybrid sensor systems in E. coli (Kobayashi et al., 1995; Hoch and Sihavy, 1995).
The 2CSs genes involved in the same signaling pathway may later evolve into an
operon for the benefit of a coordinate control. We have shown here the gene order of
SR was not uncommon in the bacterial genomes. Most likely, as the abundantly found
RS, these SR 2CSs appeared to be evolved from a common ancestor of SR gene order.
It has been proposed that, for all the RS-2CSs of OmpR-family in B. subtilis, the
sequences surrounding the phosphorylated histidine residue of the sensor kinases were
co-evolved with the cognate response regulator (Fabret et al., 1999). To clarify that
the co-evolved shceme also apply for the SR-2CSs of OmpR-family, the amino acid
sequences for RS and SR 2CS sensors from the 38 genomes were aligned using
by Koretke et al. As shown in Figure 16, the sequence logos derived from the aligned
sequences for both RS and SR groups appeared to be nearly identical, which
suggested that the interaction of the sensor and regulator components might have
preserved in the sequences before the formation of either RS- or SR-2CS.
In contrast to the OmpR-family, 2CSs of the other regulator families such as NtrC
and NarL appeared no obvious evidence of co-evolution with the sequences of their
cognate sensor kinases (Chen et al., 2004). Their relatively smaller numbers and
lacking of uniform gene organization with the accompanied cognate sensor
component nearby in the genome probably distorted the clues of co-evolution. By
compilation of 2CSs of OmpR-family from 38 species in this study, irrelevant of the
gene orders, the constraint in their interacting domains, the domain that contained
the phosphorylated histidine residue of the sensor kinase and the output domain of
the response regulator, appeared to be preserved. Since the sequences flanking the
phosphorylated aspartate residue are highly conserved, the response regulators were
classified by the relatedness of their output domains. It has been shown that only a
small number of residues around the active sites of B. subtilis response regulator
Sp0F affected its specificity of interaction, and therefore the sequences at the
receiver domain alone could not allow sufficient distinction of the response
regulators (Tzeng and Hoch, 1997). Recently, by classification of the response
regulators of B. subtilis and E. coli using surface hydrophobicity of their receiver
domains, significant correlation was shown between the receiver domain sub-class
and the sensor kinase classifications (Kojetin et al., 2003). In addition, the linker
region that joins the receiver domain and output domain was also shown to play an
important role both in the phosphorelay and output of the signal (Mattison et al.,
2002). Albeit that we were not able to show the correlation of the two components
by using the sequence comparison of the receiver domain, it is plausible that, the
correlation between the classification of the kinase and the output domain of the
response regulator could be supportive to propose the presence of a constraint
between the two interacting components during evolution.
Summary
The plasmid pLVPK (GenBank accession no. AY378100) appeared to be the
largest (219 kb) ever reported in K. pneumoniae. According to the GenBank record at
August 2004, it is the 3rd largest sequenced plasmid among the gamma-proteobacteria,
and the 6th largest among the 532 sequenced bacterial plasmids. Homology analysis is
no doubt the central methodology of genomics that produces the bulk of useful
information. However, approximately 2/3 of the ORFs in pLVPK were annotated as
"conserved hypothetical" and "hypothetical" proteins, of which there were no
functional predictions at all. Apparently, there is ample room for improvement in
computational annotation. Several recently developed approaches, known as genome
context analysis, have gone beyond sequence or structure comparison. In genome
context analysis, the genes that have no experimentally characterized homologs can
be assigned to particular cellular systems or pathways based on the associations, such
as phyletic profiles of protein families, domain fusions in multi-domain proteins, gene
adjacency in genomes, and expression patterns (Huynen et al., 2000; Galperin and
Koonin, 2000). Unlike the traditional homology analysis, results produced by these
methods are often very intuitive. It is foreseeable that, with more genomic information
available, context-based methods will substantially complement the traditional
methods and improve the situation.
We have shown that the virulence-related genes encoding the siderophore
system and the mucoid regulator constitute two PAI-like regions in pLVPK. As
shown in the Appendix I, the first PAI-like region includes the iuc, vag, shiF and
rmpA2 flanking by transposable elements and short sequences similar to that of the
3’-sequences of E. coli tRNAs. The second PAI-like region contains rmpA and iro
genes together with a set of insertion sequence genes near the iro operon. Since both
of the PAI-like regions contain virulence-associated genes and potentially mobile
elements, the study focused in both their contribution to virulence and their
prevalence among different K. pneumoniae isolates is being carried out in our
laboratory. Plenty of questions aroused from the sequence analysis await to be
answered, such as: How exactly do these gene clusters contained in the PAI-like
regions orchestrate in different environmental conditions? Why are the DNA
sequences of the two mucoid regulator genes, rmpA and rmpA2, have such a low
GC% and biased codon usage? What are the functions of the putative ORFs in this
plasmid? The availability of the completed sequence and annotation information have
promised much to the study of the pathogenesis of K. pneumoniae.
In the second chapter, our results supported the idea that most of the 2CS gene
regulator pair in P. aeruginosa PAO1. We have shown that similar biological
functions were preserved for the closely-related 2CSs in the congruent clade, however,
different transcriptional control was suggested to prevent the functional redundancy.
The analysis of hybrid kinases, on the other hand, supported the recruitment model.
The co-evolution pattern was also reported in the other interacting proteins such as
neuropeptide and the receptors (Darlinson and Richter 1999), and the archaeal
chaperonin subunits (Archibald et al., 1999), which have attempted to provide a
global view of protein linkage to their biochemical relevance in a genome.
In the third chapter, we reported our study in the evolutionary analysis of the
2CSs of OmpR-family among 38 species. We have found the existence of a
evolutionary constraint of the interacting 2CS components, the sensor and the cognate
regulator. We proposed here that, for the orphan 2CS of the OmpR family, its cognate
kinase could be assigned on the basis of the sequences identified. Although several
studies have been made to categorize response regulators and sensors by sequence
similarities (Grebe and Stock, 1999; Fabret et al., 1999; Volz, 1993), the classification
of response regulator, however, remained an unsolved question since the receiver
domain containing the phosphorylated aspartate is highly conserved. Nevertheless,
investigation on the evolutionary constraints in these interacting modules increases
our knowledge in deciphering the evolution of protein-protein interaction.
In conclusion, these studies provide fundamental insights for the research of
bacterial pathogenesis and the molecular evolutionary basis for two-component signal
transduction systems.
References
Archibald JM, Logsdon JM, and Doolittle WF (1999) Recurrent paralogy in the
evolution of archaeal chaperonins. Curr. Biol. 9, 1053–1056.
Alm RA, Ling LS, Moir DT, King BL, Brown ED, Doig PC, Smith DR, Noonan B,
Guild BC, deJonge BL et al. (1999) Genomic-sequence comparison of two
unrelated isolates of the human gastric pathogen Helicobacter pylori. Nature
397, 176-180.
Ambrozic J, Ostroversnik A, Starcic M, Kuhar I, Grabnar M, Zgur-Bertok D (1998)
Escherichia coli CoIV plasmid pRK100: genetic organization, stability and
conjugal transfer. Microbiology 144, 343-352.
Arakawa K, Mori K, Ikeda K, Matsuzaki T, Kobayashi Y, Tomita M (2003)
G-language Genome Analysis Environment: a workbench for nucleotide
sequence data mining. Bioinformatics 19, 305-306.
Baranova N, and Nikaido H (2002) The BaeSR two-component regulator system
activates transcription of the yegMNOB (mdtABCD) transporter gene cluster
in Escherichia coli and increases its resistance to novobiocin and
deoxycholate. J. Bacteriol. 184, 4168-4176.
Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna A,
Marshall M, Moxon S, Sonnhammer ELL, Studholme DJ, Yeats C, and Eddy
SR (2004) The Pfam Protein Families Database. Nucleic Acids Res. Database
Issue 32:D138-D141
Bentley SD, Chater KF, Cerdeno-Tarraga A-M, Challis GL, Tomson NR, James KD,
Harris DE, Quail MA, Kieser H, Harper D, Bateman A, Brown S, Chandra G,
Chen CA, Collins M, Cronin A, Fraser A, Goble A, Hidalgo J, Hornsby T,
Howarth S, Huang C-H, Kieser T, Larke L. Murphy L, Oliver K, O’neil S,
Rabbinowitsch E, Rajandream M-A, Rutherford K, Rutter S, Seeger K,
Saunders D, Sharp S, Squares R, Squares S, Taylor K, Warren T, Wietzorrek
A, Woodward J, Barrell BG, Parkhill J, and Hopwood DA (2002) Complete
genome sequence of the model actinomyces Streptomyces coelicolor A3(2).
Nature 417, 141-147.
Blattner FR, Plunkett G III, Bloch CA, Perna NT, Burland V, Riley M, Collado-Vides
J, Glassner JD, Rode CK, Mayhew GF et al. (1997) The complete genome
sequence of Escherichia coli K-12. Science 277, 1453-1462.
Bock A, Gross R (2001) The BvgAS two-component system of Bordetella spp.: a
versatile modulator of virulence gene expression. Int. J. Med. Microbiol. 291,
119-130.
and functional analysis of the pbr lead resistance determinant of Ralstonia
and functional analysis of the pbr lead resistance determinant of Ralstonia