• 沒有找到結果。

Protein interaction specificity and conservation of the gene organization of the two-component systems

– An analysis of the OmpR-family

3.1 Introduction

Two-component signal transduction system (2CS) is commonly seen in the

regulation of adaptive responses in bacteria. This system often contains a sensor

histidine kinase and a response regulator protein. Signals activate the sensor kinase by

causing an autophosphorylation on the conserved histidine residue. The

phosphorylated sensor then transfers this phosphate group to a conserved aspartic acid

residue of the corresponding response regulator, which thereby modulates the

subsequent cellular responses (Hoch, 2000). Most of the sensor kinases consist of a

highly variable membrane sensor domain that connected to the conserved histidine

phosphotrasferase region with a cytoplasmic linker segment. The response regulator,

in most cases, consists of an amino-terminal receiver domain followed by a linker

region and a carboxyl-terminal DNA-binding domain (Fabret et al., 1999; Hoch,

2000).

Analysis of the 2CSs in Bacillus subtilis revealed that the classification of

regulator family could be correlated with the sequences surrounding the

phosphorylated histidine residue of the cognate kinase, which suggested that the

catalytic domain of kinase and DNA-binding domain of the response regulator

2CS genes in Enterococcus faecalis (Hancock and Perego, 2002). On the basis of

sequence homology in the DNA binding domain of the response regulators (Mizuno,

1997), two major families, OmpR and NarL, were identified in Pseudomonas

aeruginosa PAO1. We have shown that the gene order of regulator-to-sensor

transcription unit is preserved within the 2CSs of OmpR-family in the genome. The

conserved sequences surrounding the active-site histidine of the sensor kinases were

shown also correlated to the classification of their corresponding response regulators

(Chen et al., 2004), which implied that the protein interaction specificity and gene

organization were preserved in the evolution of the 2CSs of OmpR family.

The OmpR-like proteins are one of the most widespread transcriptional

regulators and constituted the largest family of the response regulators (Chang and

Stewart, 1998). The family contains at least 15 proteins with extensive amino acid

sequence similarities to that of OmpR (Mizuno and Tanaka, 1997), including PhoP,

PhoB, KdpE, ArcA, and CreB. The primary osmosensor EnvZ and the response

regulator OmpR are the prototype of the 2CS required for transducing an osmotic

stress signal in bacteria. Upon activation, OmpR functions as a transcriptional factor

to modulate the expression of ompF and ompC genes that encode porins for

controlling the cell membrane permeability (Roberts et al., 1994; Delgado et al.,

1993). Other members in the OmpR family appeared to be involved in a diverse

functions such as virulent activity (PhoQ/PhoP), phosphate regulation (PhoR/PhoB),

and anaerobic nitrite reduction (ResD/ResE) (Hoch and Silhavy, 1995).

Bacteria generally possess multiple 2CSs. To ensure a specific signal

transduction, it is conceivable that a preserved specificity of the interacting doamin is

present between the kinase and the corresponding response regulator. It is of our

interest to identify the constraint through evolution between the functional coupled

2CS components, the sensor kinase and the cognate regulator. We report herein an

extended analysis, concerning the gene order and the sequence conservation in the

interacting domain, of the 2CSs of OmpR-family from 38 species, chosen to represent

the diversity. Our results indicated that the specificity between the 2CS components

may have formed before assembly of the genes to from an operon.

3.2 Materials and Methods

Database searches

The completed genomic sequences of archaea (5 species), gamma-

proteobacteria (10 species), beta-proteobacteria (5 species), Chlorobi (1 species),

proteobacteria delta-epsilonsubdivision (1 species), alpha-proteobacteria (1 species),

Gram-positive bacteria (5 species), Actinobacterium (4 species), Deinococcus (1

species), Cyanobacteria (3 species), Aquificae (1 species), and Thermotogae (1

species) were collected to search for the proteins containing respectively the sensor

kinase transmitter domain, response regulator receiver domain, and the DNA-binding

domain of OmpR-type regulator. The HMM for the transmitter domain (HisKA.hmm),

receiver domain (Response_reg.hmm), and OmpR C-terminal (Trans _reg_C.hmm)

were obtained from the pfam database (http://www.sanger.ac.uk/Software/ Pfam/)

(Bateman et al. 2004). To look for these domains in the query sequences of total

ORFs from the collected genomes, HMMSEARCH (HMMER version 2.2) (Durbin et

al., 1998) was used.

The ORFs that contain both the receiver domain and OmpR C-terminal domain

were designated as “OmpR-type” regulators. The regulator encoding genes that

appeared to have accompanied ORFs nearby bearing a histidine kinase transmitter

domain were noted as “paired 2CS”. Alternatively, the regulator encoding genes

without the physically linked sensor gene were nominated as ‘orphan regulators’.

Multiple sequence analysis and estimation of phylogeny

The phylogeny of the selected organisms was produced based on

maximum-likelihood analysis of the 16S rRNA, which were obtained from the

European ribosomal RNA database (Wuyts et al., 2002). Sequences alignment were

performed by ClustalW (Thompson et al., 1997) and phylogenetic analyses were

performed using PAUP* 4.0 (Swofford, 1998). A maximum likelihood tree was

obtained based on a general time reversible (GTR) model. The tree was rooted with

the eukaryotic lineage of Arabidopsis and Yeast. The genome sequence and ORFs of

Klebsiella pneumoniae str. NTUH-K2044 was kindly provided by Dr. Shih-Feng Tsai

at NHRI, Taiwan (http://genome.nhri.org.tw/kp/).

Sequence logos

The amino acid sequences around the phosphorylated histidine residue of each

histidine kinases were identified after multiple sequence alignment, which was

performed using ClustalW with the BLOSUM62 (Henikoff and Henikoff 1992)

respectively according to the parameters used (Koretke et al., 2000). The sequences in

length of 14 amino acids were collected and graphical representations of the multiple

sequence alignments, the sequence logos, were presented using WebLogo (Crooks et

al., 2004).

3.3 Results and Discussions

Compilation of the 2CSs of the OmpR family

The ORFs from 38 species, including 5 archaea, and 33 eubacteria were collected

from GenBank. As shown in the maximum-likelihood analysis based on the 16S

rRNA sequences, the phylogeny built for the chosen species were placed in the clades

that correspond to the major taxons, which include the archaea, purple bacteria

(proteobacteria), Gram-positives, Actinomycetes, Cyanobacteria, Deinococcus,

Thermotogales (Figure 14). Corresponding to the canonical view of bacterial

taxonomy, such phylogeny indicated that the chosen species represent a wide

diversity of different archaea and eubacteria. We firstly searched the ORFs using

HMMER (Durbin et al., 1998) for the three domains including the transmitter domain

of sensor kinase, the receiver domain of response regulator, and the C-terminal

DNA-binding domain of OmpR response regulator. As shown in Table 3, a bacterium

that carries more HisKA (histidine kinases) generally contains more Receivers

(response regulators). Most of the closely-related species, for example, those of the

Salmonella genus, appeared to have similar number of 2CSs. Except the 5 archaeal

genomes, the other bacteria appeared to carry a number of OmpR-like regulators.

gene mostly in a regulator-to-sensor transcriptional orientation. Some of the

OmpR-like regulator genes do not have a linked kinase gene and hence named orphan

regulators (OP). A relatively small portion of the OmpR-like regulator encoding genes

revealed a different transcriptional orientation with their accompanied sensor genes.

Interestingly, more orphan OmpR-like regulators were found in the cyanobacteria,

including Synechococcus, Synechocystis and Prochlorococus.

Conventional view of the universal tree is that archaea and eukaryotes are sister

groups rooted in the bacteria, and the three were separated into monophyletic groups

(Brown and Doolittle, 1997). The genes encoding 2CSs were identified in all three

domains of life (Woese et al., 1990), however, mostly present in bacteria. Among

eukaryotes, only yeast, fungi, slime molds, and plants appeared to contain 2CSs. Only

a single histidine kinase and three response regulators were found in the completed

yeast Saccharomyces cerevisiae genome (Koretke et al., 2000). In our analysis, no

OmpR-like response regulator was found in the five archaea genomes, which

suggested that the OmpR 2CS genes were originated from a common ancestor in

bacteria thereby spread to most, but not all, of the bacteria species. These genes were

also passed to some of the eukaryotes, probably through horizontal gene transfer

events. Conservation of the interacting domain of the sensor kinases suggested that,

although co-evolution of the paired 2CS genes of the OmpR-faimly is the major

scheme, recruitment of the interacting components from distantly located sensor and

response regulator genes may still be important for a flexible regulation in bacteria.

Correlation of the 2CSs number with the genome size

As shown in Figure 15, our analysis indicated that the numbers of 2CS

components including sensor kinase, response regulators, and OmpR-like regulators

were positively correlated with their genome sizes. The P. aeruginosa genome

appeared to carry the largest number of 2CS genes among the chosen species. This

may explain the flexible habitats of the bacteria that, as opportunistic pathogens, they

are not only able to survive in various environments, but also adapt rapidly to the

changing host conditions (Rodrigue et al., 2000). The alpha proteobacteria

Rhodopseudomonas palustris, which is among the most metabolically versatile

bacteria capable of photoautotrophic, photoheterotrophic, chemoheterotrophic, or

chemoautotrophic growth (Larimer et al., 2004), also harbored a large number of 2CS

genes. The Streptomyces coelicolor A3, known as the most numerous and ubiquitous

soil bacteria (Bentley et al., 2002), likewise, carried more than hundred of 2CS genes.

This could be concluded that the relatively large number of the 2CS genes in a

genome allow the bacteria to grow in many different environments.

underestimation of the number, the numbers in Table 3 appeared to be smaller than

that of the annotation results in some of the genomes. Wheras, the numbers of

response regulator may be slightly overestimated since some of the 2CSs appear as a

hybrid in which a protein contains the transmitter domain of a sensor fused with a

response regulator receiver domain. However, the hybrid sensors only accommodate

for a minor portion of the sensor kinases, which should not result in significant bias to

the cross-species comparison.

Gene organization of the 2CS of OmpR-family

We have reported recently that, in the P. aeruginosa PAO1 genome, most of the

OmpR-type regulator-containing 2CS systems, as an operon, revealed the gene order

of regulator-to-kinase (RS) (Chen et al., 2004, in press). This phenomenon was also

seen in the analysis of the 2CSs in B. subtilis (Fabret et al., 1999). In the 38 genomes

analyzed, we have identified 325 ORFs that contained both a response regulator

receiver domain and an OmpR-like C-terminal domain. Among them, 258 (79.3%)

carried an accompanied sensor kinase in the same transcription direction, 50 appeared

to be orphan regulator genes, the other 14 showed different transcriptional directions

with the accompanied sensor genes. In the 258 regulator-sensor pairs, 220 (85.3%)

formed a gene order of regulator-to-sensor (RS). The majority of the less found

sensor-to-regulator (SR) 2CSs are homologs of kdpDE and baeSR (Polarek et al.,

1992; Baranova and Nikaido, 2002), which imply that the majority of these SR-type

2CSs probably also have a shared ancestry.

Analysis of sequences flanking the phosphorylated histidine residue of the sensor

kinases

Signal transduction between the sensor and regulator which are encoded by

distantly located genes has been reported in the sporulation system of B. subtilis and

the hybrid sensor systems in E. coli (Kobayashi et al., 1995; Hoch and Sihavy, 1995).

The 2CSs genes involved in the same signaling pathway may later evolve into an

operon for the benefit of a coordinate control. We have shown here the gene order of

SR was not uncommon in the bacterial genomes. Most likely, as the abundantly found

RS, these SR 2CSs appeared to be evolved from a common ancestor of SR gene order.

It has been proposed that, for all the RS-2CSs of OmpR-family in B. subtilis, the

sequences surrounding the phosphorylated histidine residue of the sensor kinases were

co-evolved with the cognate response regulator (Fabret et al., 1999). To clarify that

the co-evolved shceme also apply for the SR-2CSs of OmpR-family, the amino acid

sequences for RS and SR 2CS sensors from the 38 genomes were aligned using

by Koretke et al. As shown in Figure 16, the sequence logos derived from the aligned

sequences for both RS and SR groups appeared to be nearly identical, which

suggested that the interaction of the sensor and regulator components might have

preserved in the sequences before the formation of either RS- or SR-2CS.

In contrast to the OmpR-family, 2CSs of the other regulator families such as NtrC

and NarL appeared no obvious evidence of co-evolution with the sequences of their

cognate sensor kinases (Chen et al., 2004). Their relatively smaller numbers and

lacking of uniform gene organization with the accompanied cognate sensor

component nearby in the genome probably distorted the clues of co-evolution. By

compilation of 2CSs of OmpR-family from 38 species in this study, irrelevant of the

gene orders, the constraint in their interacting domains, the domain that contained

the phosphorylated histidine residue of the sensor kinase and the output domain of

the response regulator, appeared to be preserved. Since the sequences flanking the

phosphorylated aspartate residue are highly conserved, the response regulators were

classified by the relatedness of their output domains. It has been shown that only a

small number of residues around the active sites of B. subtilis response regulator

Sp0F affected its specificity of interaction, and therefore the sequences at the

receiver domain alone could not allow sufficient distinction of the response

regulators (Tzeng and Hoch, 1997). Recently, by classification of the response

regulators of B. subtilis and E. coli using surface hydrophobicity of their receiver

domains, significant correlation was shown between the receiver domain sub-class

and the sensor kinase classifications (Kojetin et al., 2003). In addition, the linker

region that joins the receiver domain and output domain was also shown to play an

important role both in the phosphorelay and output of the signal (Mattison et al.,

2002). Albeit that we were not able to show the correlation of the two components

by using the sequence comparison of the receiver domain, it is plausible that, the

correlation between the classification of the kinase and the output domain of the

response regulator could be supportive to propose the presence of a constraint

between the two interacting components during evolution.

Summary

The plasmid pLVPK (GenBank accession no. AY378100) appeared to be the

largest (219 kb) ever reported in K. pneumoniae. According to the GenBank record at

August 2004, it is the 3rd largest sequenced plasmid among the gamma-proteobacteria,

and the 6th largest among the 532 sequenced bacterial plasmids. Homology analysis is

no doubt the central methodology of genomics that produces the bulk of useful

information. However, approximately 2/3 of the ORFs in pLVPK were annotated as

"conserved hypothetical" and "hypothetical" proteins, of which there were no

functional predictions at all. Apparently, there is ample room for improvement in

computational annotation. Several recently developed approaches, known as genome

context analysis, have gone beyond sequence or structure comparison. In genome

context analysis, the genes that have no experimentally characterized homologs can

be assigned to particular cellular systems or pathways based on the associations, such

as phyletic profiles of protein families, domain fusions in multi-domain proteins, gene

adjacency in genomes, and expression patterns (Huynen et al., 2000; Galperin and

Koonin, 2000). Unlike the traditional homology analysis, results produced by these

methods are often very intuitive. It is foreseeable that, with more genomic information

available, context-based methods will substantially complement the traditional

methods and improve the situation.

We have shown that the virulence-related genes encoding the siderophore

system and the mucoid regulator constitute two PAI-like regions in pLVPK. As

shown in the Appendix I, the first PAI-like region includes the iuc, vag, shiF and

rmpA2 flanking by transposable elements and short sequences similar to that of the

3’-sequences of E. coli tRNAs. The second PAI-like region contains rmpA and iro

genes together with a set of insertion sequence genes near the iro operon. Since both

of the PAI-like regions contain virulence-associated genes and potentially mobile

elements, the study focused in both their contribution to virulence and their

prevalence among different K. pneumoniae isolates is being carried out in our

laboratory. Plenty of questions aroused from the sequence analysis await to be

answered, such as: How exactly do these gene clusters contained in the PAI-like

regions orchestrate in different environmental conditions? Why are the DNA

sequences of the two mucoid regulator genes, rmpA and rmpA2, have such a low

GC% and biased codon usage? What are the functions of the putative ORFs in this

plasmid? The availability of the completed sequence and annotation information have

promised much to the study of the pathogenesis of K. pneumoniae.

In the second chapter, our results supported the idea that most of the 2CS gene

regulator pair in P. aeruginosa PAO1. We have shown that similar biological

functions were preserved for the closely-related 2CSs in the congruent clade, however,

different transcriptional control was suggested to prevent the functional redundancy.

The analysis of hybrid kinases, on the other hand, supported the recruitment model.

The co-evolution pattern was also reported in the other interacting proteins such as

neuropeptide and the receptors (Darlinson and Richter 1999), and the archaeal

chaperonin subunits (Archibald et al., 1999), which have attempted to provide a

global view of protein linkage to their biochemical relevance in a genome.

In the third chapter, we reported our study in the evolutionary analysis of the

2CSs of OmpR-family among 38 species. We have found the existence of a

evolutionary constraint of the interacting 2CS components, the sensor and the cognate

regulator. We proposed here that, for the orphan 2CS of the OmpR family, its cognate

kinase could be assigned on the basis of the sequences identified. Although several

studies have been made to categorize response regulators and sensors by sequence

similarities (Grebe and Stock, 1999; Fabret et al., 1999; Volz, 1993), the classification

of response regulator, however, remained an unsolved question since the receiver

domain containing the phosphorylated aspartate is highly conserved. Nevertheless,

investigation on the evolutionary constraints in these interacting modules increases

our knowledge in deciphering the evolution of protein-protein interaction.

In conclusion, these studies provide fundamental insights for the research of

bacterial pathogenesis and the molecular evolutionary basis for two-component signal

transduction systems.

References

Archibald JM, Logsdon JM, and Doolittle WF (1999) Recurrent paralogy in the

evolution of archaeal chaperonins. Curr. Biol. 9, 1053–1056.

Alm RA, Ling LS, Moir DT, King BL, Brown ED, Doig PC, Smith DR, Noonan B,

Guild BC, deJonge BL et al. (1999) Genomic-sequence comparison of two

unrelated isolates of the human gastric pathogen Helicobacter pylori. Nature

397, 176-180.

Ambrozic J, Ostroversnik A, Starcic M, Kuhar I, Grabnar M, Zgur-Bertok D (1998)

Escherichia coli CoIV plasmid pRK100: genetic organization, stability and

conjugal transfer. Microbiology 144, 343-352.

Arakawa K, Mori K, Ikeda K, Matsuzaki T, Kobayashi Y, Tomita M (2003)

G-language Genome Analysis Environment: a workbench for nucleotide

sequence data mining. Bioinformatics 19, 305-306.

Baranova N, and Nikaido H (2002) The BaeSR two-component regulator system

activates transcription of the yegMNOB (mdtABCD) transporter gene cluster

in Escherichia coli and increases its resistance to novobiocin and

deoxycholate. J. Bacteriol. 184, 4168-4176.

Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna A,

Marshall M, Moxon S, Sonnhammer ELL, Studholme DJ, Yeats C, and Eddy

SR (2004) The Pfam Protein Families Database. Nucleic Acids Res. Database

Issue 32:D138-D141

Bentley SD, Chater KF, Cerdeno-Tarraga A-M, Challis GL, Tomson NR, James KD,

Harris DE, Quail MA, Kieser H, Harper D, Bateman A, Brown S, Chandra G,

Chen CA, Collins M, Cronin A, Fraser A, Goble A, Hidalgo J, Hornsby T,

Howarth S, Huang C-H, Kieser T, Larke L. Murphy L, Oliver K, O’neil S,

Rabbinowitsch E, Rajandream M-A, Rutherford K, Rutter S, Seeger K,

Saunders D, Sharp S, Squares R, Squares S, Taylor K, Warren T, Wietzorrek

A, Woodward J, Barrell BG, Parkhill J, and Hopwood DA (2002) Complete

genome sequence of the model actinomyces Streptomyces coelicolor A3(2).

Nature 417, 141-147.

Blattner FR, Plunkett G III, Bloch CA, Perna NT, Burland V, Riley M, Collado-Vides

J, Glassner JD, Rode CK, Mayhew GF et al. (1997) The complete genome

sequence of Escherichia coli K-12. Science 277, 1453-1462.

Bock A, Gross R (2001) The BvgAS two-component system of Bordetella spp.: a

versatile modulator of virulence gene expression. Int. J. Med. Microbiol. 291,

119-130.

and functional analysis of the pbr lead resistance determinant of Ralstonia

and functional analysis of the pbr lead resistance determinant of Ralstonia

相關文件