• 沒有找到結果。

細菌致病因子的基因體學分析

N/A
N/A
Protected

Academic year: 2021

Share "細菌致病因子的基因體學分析"

Copied!
98
0
0

加載中.... (立即查看全文)

全文

(1)

國立交通大學

生物科技系所

博士論文

細菌致病因子的基因體學分析

A Genome-Wide Study on the Virulence Determinants in Bacteria

研究生:陳盈璁

(8928802)

Student: Ying-Tsong Chen

指導教授:彭慧玲 博士

Advisor: Hwei-Ling Peng Ph.D.

中華民國九十三年九月

(2)

摘要

這個論文由三個針對細菌致病因子的研究構成。首先在第一個部份,是克雷

白氏菌(Klebsiella pneumoniae)裡一個重要的致病性巨型質體(plasmid) pLVPK的

核酸定序以及基因註解(annotation)工作。我們定序了這個全長21.9萬鹼基配對

(base pair)的巨型質體並且在其中註解出251個開讀框架(ORF; open reading

frame)。在這些可能的基因之中,我們發現了許多明顯可能與細菌致病性有關的

毒性基因(virulence gene),其中包括了控制莢膜多醣體(CPS)合成的rmpA2以及它

的同源基因rmpA,各種的抓鐵系統基因如iucABCD-iutA,iroBCDN,fepBC,fecIRA

等。此外,我們也在這個質體的核酸序列中發現了與其他細菌中負責產生對於

銅、銀、鉛、碲等的抗性基因組(gene-cluster)相似的數個基因組。在質體上主要

的基因組之間,我們總共發現了十三個由插入序列(insertion sequence, IS)組成的

區域,這些插入序列使我們相信這個大質體極有可能是經由一系列水平轉移 (horizontal-transferred)基因組的重組而形成。 在論文的第二部份,我們在綠膿桿菌(Pseudomonas aeruginosa)的全基因體序 列中針對一類重 要的毒性因 子—雙分子 調 控系統基因(2CS; two-component system)—的演化進行分析。我們分析了這些基因的排列方式以及它們所編碼 (encode)的蛋白質序列的功能區塊(domain)組成。再經由比對感受子(sensor)與反

(3)

有一半以上的雙分子調控系統它所包含的兩個單元之間有很明顯的共演化 (co-evolution)現象。我們也發現,以上的共演化特徵,在一群帶有與OmpR相似 反應子的雙分子調控系統中顯得特別的明顯。相反的在一些其它的雙分子調控系 統中,特別是那些帶有與NarL相似反應子的,這種特徵就比較不明顯。在分別 針對感受子與反應子所進行的分組之間找到的關聯性,也支持了以上的結果。此 外,證據也顯示六群雙分子調控系統極有可能分別演化來自六個共同的來源。從 鄰近基因的功能看來,這些雙分子調控系統基因非但是經由基因組整個的重製 (duplication)而產生,它們還很可能在重製之後仍然保有相同的功能。我們更進 一步分析並比較它們開讀框架前端的未轉錄序列(untranslated sequence)時,發現 細菌可能對這些基因組採取不同的基因轉錄調控來避免功能的重疊。 最後一個章節中,我們則利用HMMER從38個已經完成基因體序列定序的微 生物中找出並且分析其中OmpR家族雙分子調控系統的演化。OmpR家族雙分子 調控系統在這些物種之中的分布也支持了雙分子調控系統源自細菌而後傳播至 其他物種的理論。一般而言,OmpR家族的雙分子調控系統,它們的基因排列都 是呈現一個從反應子到感受子(RS)的排列方式。我們分別分析了屬於RS與SR(與 RS恰好相反的基因排列順序)的兩群雙分子調控系統,發現它們感受子上,接受 磷酸基修飾(phosphorylated)的組胺酸殘基(histidine residue)附近的序列都非常的 相似。這意味著雙分子調控系統的兩個組成單元之間的蛋白質-蛋白質作用很可 能約束(constrain)了蛋白質上特定功能區域的演化。這個現象也同時說明了感受

(4)

子與反應子之間的作用應該發生於兩個單元在演化歷史中組合成為成RS或SR基

因組之前。在這些基因組之間,還保留著相當相似的基因排列順序及蛋白質功能

(5)

Abstract

The thesis covered three major approaches aimed to identify the property of

bacterial virulence determinants. In the first part, we determined the entire DNA

sequence of pLVPK, a 219-kb virulence plasmid harbored in Klebsiella pneumoniae.

A total of 251 open reading frames were annotated. The obvious virulence-associated

genes carried by the plasmid are the capsular polysaccharide synthesis regulator rmpA

and its homolog rmpA2, and multiple iron-acquisition systems, including

iucABCD-iutA and iroBCDN siderophore gene clusters, fepBC ABC-type transporter, and fecIRA that encodes iron uptake regulatory system. In addition, several gene

clusters homologous with copper, silver, lead, and tellurite resistance genes of other

bacteria were also identified. The presence of thirteen insertion sequences located

mostly at the boundaries of the aforementioned gene clusters suggests that pLVPK

was derived from a sequential assembly of various horizontally-acquired DNA

fragments. In the second part, we analyzed the complete genome sequence of P.

aeruginosa PAO1 to unravel the evolution of a group of important virulence factors, the two component systems. Gene organization and functional motif analyses of the

123 two component system (2CS) genes in Pseudomonas aeruginosa PAO1 were

(6)

components, we showed that more than half of the sensor-regulator gene pairs,

especially the 2CSs with OmpR-like regulators, are derivatives of a common ancestor,

and have most likely co-evolved through gene pair duplication, while several of the

2CS pairs, especially those with NarL-like regulators, appeared to be relatively

divergent. Correlation of the classification of sensor kinases and response regulators

further provides support for these models. We have identified six congruent clades,

which represent the group of the most recently duplicated 2CS gene pairs. Sequence

comparison showed that certain paralogous 2CS pairs may carry a redundant function

even after a gene duplication event. However, comparative analysis of the putative

promoter regions of the paralogs suggested that functional redundancy could be

prevented by a differential control. Finally in the third part, we analyzed 38 completed

genomes using HMMER to identify the putative 2CS components and investigate the

evolution of the 2CSs of OmpR-family. The distribution of OmpR-like response

regulators among different genomes of different taxonomy groups also supported the

hypothesis that 2CSs are originated in the last common ancestor of bacteria and

subsequently passed to the other species. Mostly, the 2CS genes containing an

OmpR-like regulator-encoding gene were found in the order of regulator-to-sensor

(7)

This suggested that the interaction of 2CS component may have constrained the

sequences of the interacting domain between sensor kinase and the cognate response

regulator while the ancestral components were brought together during evolution into

a RS or SR gene cluster, where coordinated transcriptional control may be

economically favored. The nearly invariant gene order and the conservation of

(8)

Acknowledgments

若沒有遇到像彭慧玲博士這樣的導師,我不會有機會在這裡大模大樣的寫這 個致謝辭。能圓滿走完這個求學生涯的重要階段,感謝的人很多,然而一切都沒 有如老師的無私付出與苦心教誨一般,能在這四年的時光裡深深的影響了我。我 由衷感謝我的指導教授彭慧玲博士,在這段期間以無比的耐心與愛包容我的青澀 與固執,許多次帶領我、驅策我走出陰影。在這裡研究與學習,克竟難關完成學 業,實在是我的福氣與榮幸。我要謝謝曾經一起在交大打拼的夥伴們:靖婷、盈 蓉、榕華、怡欣、騰逸、巧韻、致翔、定宇、美甄、婉君、珮瑄、平輝、新耀、 祐俊、健誠、育盛、欣穎、志凱、心瑋、佳融、睿瑜、昀錚、頌瑾… 謝謝清大 無比認真的張晃猷教授、賴怡琪博士,Nadini、韶智、Jaya、文玲、志宇、貞儀、 莉芳…謝謝大家給我許多的協助以及美好的回憶。謝謝交大生科所有的老師們, 特別是盧錦隆老師、邱顯泰老師和楊昀良老師,先後不辭辛苦的擔任我的口試委 員。還要感謝毛仁淡院長,在我最低潮的時候給我下了很多猛藥。謝謝許芳榮老 師,林耀鈴老師和蔡英德老師在生物資訊上的寶貴指點。還要謝謝陽明大學的蔡 世峰教授,吳克銘,在定序與基因體研究上給了我許多的指導與協助。也要感謝 我的爸爸媽媽姊姊一直支持我放心的投入我的學業。最後,感謝我摯愛的純沂與 樂融,感謝愛妻的無悔相許跟扶持,願以我小小的成就與快樂,與你們共榮。

(9)

Contents

摘要... 1 Abstract... 4 Acknowledgments ... 7 List of Tables... 9 List of Figures ... 10 Abbreviations ... 11 Introduction... 12 Chapter 1... 21 Chapter 2... 36 Chapter 3... 56 Summary... 70 References... 74 Tables... 94 Figures ... 98 Appendix I... 124

(10)

List of Tables

Table 1 ... 94

Table 2 ... 95

(11)

List of Figures

Figure 1... 98 Figure 2... 99 Figure 3... 100 Figure 4... 101 Figure 5... 102 Figure 6... 103 Figure 7... 104 Figure 8... 105 Figure 9... 108 Figure 10...111 Figure 11...112 Figure 12...116 Figure 13...118 Figure 14... 120 Figure 15... 121 Figure 16... 123

(12)

Abbreviations

2CS two-component system

Ap Ampicillin

bp base pair(s)

BLAST basic local alignment search tool CPS capsular polysaccharide

COG cluster of orthologous groups

DNA Deoxyribonucleic acid

EDTA Ethylenediamine-tetraacetic acid

G+C GC content of gene

GC3 GC of silent 3rd codon position Hpt histidine-containing phosphotransfer

HMM hidden Markov model

IS insertion sequence

I; T; R; O input-, transmitter-, receiver, output-domains

kb kilobase(s)

mb millionbase(s)

kD kiloDalton(s)

LB Luria Bertani

Nc number of codons used

mM millimolar

ng nanogram

ORF open reading frame

ori origin of replication

OTU operational taxonomic units PBS phosphate-buffered saline PCR polymerase chain reaction

PAGE polyacrylamide gel electrophoresis

RNA ribonucleic acid

rpm revolutions per minute SDS sodium dodecyl sulfate

SR; RS sensor-histidine kinase and response regulator gene pairs Tris Tris(hydroxymethyl)-aminomethane

(13)

Introduction

Microbial infections are one of the major causes of fatality worldwide. In 1998,

the most common cause of death among children, defined by the World Health

Organization as aged between 0 and 4 years, was infectious diseases, which accounted

for 63% of all fatalities (http://www.who.org). According to the report of CDC

(Centers for Disease Control and Prevention in the United States) report, microbial

agents are the 4th actual cause of the deaths in the United States, 2000

(http://www.cdc.gov). Finlay and Falkow had discussed the definitions of microbial

pathogenicity and the idea that pathogens can be distinguished from their non-virulent

counterparts by the presence of virulence genes (Finlay and Falkow, 1997). In general,

bacterial virulence factors can be divided into several groups. These include the

adherence and colonization factors, invasins, capsules and surface components,

endotoxins, exotoxins, siderophores, the secretion systems for toxin transport, and the

two component systems, by which the expression of many virulence genes are

controlled (Krogfelt, 1991; Merritt and Hol, 1995; Payne, 1993; Miller et al., 1989;

Finlay and Falkow, 1997).

Beginning from the 1970s, the development of molecular genetics and

(14)

advent of whole-genome sequencing, a revolution in infectious disease research has

begun and entered its large-scale production. Genomics, taking advantage of the

completed genome sequences, is a top-down approach to study of the genes and their

functions in a genome. The completed genome sequences allowed us to decipher the

entire microbial physiology and the underlying evolutionary process based on the

information encoded in the DNA (Vázquez-Boland et al., 1999). Searches for

virulence genes can be achieved on a genome-wide scale by a variety of bioinformatic

and genetic techniques. The accumulation of the genome sequences also empowered a

new scientific discipline, molecular evolution, in which the evolutionary history of

genes and organisms can be reconstructed.

The first genome, the genome of RNA bacteriophage MS2, was sequenced in

1976 (Fiers et al., 1976). This was followed by the genome of bacteriophage ψX174,

with the aid of the rapid sequencing method developed by Walter Gilbert and Fred

Sanger (Maxam and Gilbert, 1977; Sanger et al., 1977). These are some of the known

smallest genomes with only four and ten genes, respectively. Subsequently, in 1982,

Sanger announced the sequence of a relatively large genome, the genome of

bacteriophage λ, which has 48,502 bases of genomic DNA and ~70 known and

(15)

of translation starts and codon usage, the word "homolog" was not used. The first

protein sequence database, the Protein Identification Resource, was launched by

Margaret Dayhoff in 1965, long before the genomics has even become conceivable

(Dayhoff et al., 1965). However, It is not until the beginning of the sequencing era, a

considerable number of completed genomes have been amassed, the time was ripe for

the birth of comparative genomics (Koonin and Galperin., 2002).

The announcement of complete genome sequence of the parasitic bacterium

Haemophilus influenzae (Fleischmann et al., 1995) was greatly facilitated by the whole-genome shotgun approach pioneered by Craig Venter, Hamilton Smith, and

Leroy Hood (Venter et al., 1996). Since then, completed genome sequences of

bacteria and archaea have been accumulating steadily. The second genome

sequencing paper on the Mycoplasma genitalium (Fraser et al., 1995) genome

inevitably became a comparative-genomics study. Comparison of this genome to that

of Haemophilus influenzae was carried out and the profound differences in

physiology and metabolic capacity between these two bacteria were correlated with

the differences in genome content (Fraser et al., 1995). The phenomena of

lineage-specific gene loss, a common type of event during genome evolution, were

identified also by genome comparison. For example, the genome of Mycoplasma

(16)

Mycobacterium leprae, a closely related species to M. tuberculosis, however, has at least 1,200 fewer genes (Cole et al., 2001). As the list of completed genomes rapidly

becomes outdated, periodically updated listings of both finished and unfinished

genome sequencing projects are available at the web sites Genomes On Line Database

(GOLD, http://www.genomesonline.org/) (Kyrpides, 1999). By August 2004, more

than a thousand of genome projects were executed and hundreds of completely

sequenced genomes are available in public databases. The largest prokaryotic

genomes (Streptomyces avermitilis among the eubacteria, Methanosarcina

acetivorans among the archaea) sequenced only recently promise many interesting discoveries yet to come (Ikeda et al., 2003; Galagan et al., 2002).

The massive influx of information from the genome sequencing projects is

revolutionizing the science of bacterial pathogenesis, ranging from understanding the

most basic aspects of gene content and pathogen genome organization, to elucidating

the regulatory networks of virulence gene expression, and to investigating the global

patterns of host response to infection. The methods of comparative genomics have

made headway in addressing these issues for specific bacterial pathogens.

Helicobacter pylori, for example, colonizes the human stomach where it can cause a wide spectrum of diseases ranging from asymptomatic gastritis to ulcers to gastric

(17)

harboring a specific DNA segment, called a pathogenicity island (PAI), which

includes a cytotoxin together with a bacterial type IV secretion system that delivers

the toxin into host cells (Censini et al., 1996). Genome-sequence comparison was

applied to the search of strain-specific genes for the first time when genome

sequences of two unrelated H. pylori clinical isolates were compared (Alm et al.,

1999). The results showed that 6%~7% of the genes appeared to be strain-specific

genes encoding the cell-surface proteins that are most likely contributing to the

persistence of bacteria during long-term infections (Alm et al., 1999; Salama et al.,

2000).

Escherichia coli O157:H7 is a cause of food- and water- borne illness that is now a public health problem worldwide. The first genome sequence of the pathogenetic E.

coli O157:H7, which was isolated from the major outbreak in Sakai, Japan in 1996, was completed by Hayashi et al. (Hayashi et al., 2001). At about the same time, a

second, near-complete sequence of E. coli O157:H7 isolated from the hamburger meat

was reported (Perna et al., 2001), which has been implicated as the culprit for the first

outbreak in North America (Riley et al., 1983). Sequences of the two O157:H7

genomes appeared to be very similar, however, dramatically different from that of the

laboratory strain E. coli K-12 while the sequences compared using MEM, a system for

(18)

Mb larger than the K-12 genome, the O157:H7 carried strain-specific DNA in which

~10% were assumed to have virulence-related functions (Hayashi et al., 2001). The

O157:H7 strain-specific DNA are organized into ~180 separate regions in the genome

and were referred to as O-islands (Perna et al., 2001). Several of these O-islands

include virulence determinants such as the bacteriophage-associated Shiga toxin (stx)

genes (O’Brien et al., 1992), fimbrial biosynthesis systems, iron uptake and utilization

clusters, and putative non-fimbrial adhesins (Perna et al., 2001). The LEE (locus of

enterocyte effacement) pathogenicity island, is one of the O-islands which contains

~40 genes encoding the proteins required for the close attachment of bacterial cells to

the intestinal epithelium. The acquisition of the LEE island and the stx genes were

recognized as two of the critical steps in the evolution of E. coli O157:H7 (Reid et al.,

2000).

Bacterial plasmids, bacteriophages, and other mobile genetic elements play a key

role in a haploid world as the seminal effectors of metabolic diversity and

specialization. These mobile elements are often essential components of the bacterial

pathogenicity (Finlay and Falkow, 1997). It is well established that many

pathogenicity factors (and antibiotic-resistance genes) engoding genes on the plasmid

(19)

pseudotuberculosis, and all pathogenic Yersinia enterocolitica. The plasmid contains the Yop virulon, which produces several secreted proteins, Yops, to cause damage of

the host cells and paralyzing of the phagocytic cells, and a type III secretion system to

mediate the translocation of the Yops (Rosqvist et al., 1995).

Salmonella is the causative agent of food-borne gastroenteritis and typhoid fever. The presence of numerous pathogenicity islands has conferred their ability to cause

disease. The Salmonella genus was divided into two species, S. bongori and S.

enterica. The latter contained the serovars Typhi and Typhimurium, which is responsible for 99% of human infections (Whittam and Bumbaugh, 2002). A

comparative genome analysis of the pathogenicity islands (PAI) of Salmonella

enterica serovars Typhi and Typhimurium, and E. coli revealed that the PAI-associated tRNA loci appeared to be species-specific and were horizontally

acquired (Hansen-Wester and Hensel, 2002). The differences in the distribution of

these tRNA-associated elements likely conferred that bacteria unique pathogenic

potentials, such as the restriction of host range or the type of disease (Hansen-Wester

and Hensel, 2002).

It has also been proposed that certain pathogenic bacteria were evolved from

related nonpathogenic organisms by genetically acquiring relatively large blocks of

(20)

virulence determinants, particularly toxins and adherence factors, were found on

mobile genetic elements, which can be distributed to other bacteria by transformation,

conjugation, and transduction. This kind of genetic spread is readily seen in the

spreading of R-plasmids and the transposition of antibiotic resistance genes. The

studies on plasmids are undeniably one of the central issues in the investigation of

bacterial pathogenesis.

The geneticist Theodosius Dobzhansky believed “Nothing in biology makes

sense except in the light of evolution” (Lewontin et al., 2003). Comparative genomics

allowed explanation of the most common and important types of events that occurred

during genome evolution, including genome rearrangement and gene duplication. The

major impact of comparative genomics in genome evolution has shown "genomes in

flux", which changed the classic concept that genomes are relatively stable and evolve

through gradual changes and spread through vertical inheritance (Snel et al., 2002).

The comparative study of the proteins of close-related genomes also improved our

understanding of the evolutionary pressure on the molecules involved in the

emergence of certain infectious diseases (Whittam and Bumbaugh, 2002).

The upcoming chapters include the study on the virulence determinants in

(21)

virulence of Klebsiella pneumoniae, an important opportunistic pathogen of human.

The second chapter focused on the molecular evolution of a group of virulence factors,

the two component systems (2CS), in another important human pathogen,

Pseudomonas aeruginosa PAO1. 2CSs are the means that bacteria sense the environmental stimuli and make physiological responses correspondingly. By

performing the analysis of gene organization and functional motif analysis, together

with the comparative phylogenetic, hypotheses on the evolution of the 123 2CS genes

were proposed. The third chapter, the evolutionary constraint in the protein interacting

domains and the conservation of gene organization of the 2CSs were identified

through analysis of the genomes from 38 organisms. The era of complete genomes

holds promise to the study of bacterial pathogenesis and also fresh prospective on the

molecular evolution of virulence factors. The comparative genomics and evolutionary

biology will surely lead to more interesting discoveries yet to come concerning the

(22)

Chapter 1

Sequencing and analysis of the large virulence plasmid pLVPK

of Klebsiella pneumoniae CG43

(23)

1.1 Introduction

Klebsiella pneumoniae is an important cause of community-acquired bacterial pneumonia, occurring particularly in chronic alcoholics and commonly results in a high fatality rate if untreated. Nevertheless, the vast majority of K. pneumoniae infections are associated with hospitalization. It has been estimated that K. pneumoniae causes up to 8% of all nosocomial bacterial infections in developed countries, and its colonization in hospitalized patients appears to be associated with the use of antibiotics (Schaberg et al., 1991). Recently, the prevalence of multiple-drug resistant K. pneumoniae strains has significantly restricted the availability of antibiotics for effective treatment of the bacterial infections.

Despite its significance, our knowledge of the pathogenicity of the bacterium is rather limited. Clinically isolated K. pneumoniae usually produces large amounts of capsular polysaccharides (CPS) as reflected by the formation of glistening mucoid colonies. The CPS provides the bacterium an anti-phagocytic ability and prevents the bacteria from being killed by serum bactericidal factors (Simmons-Smit et al., 1986). Additional virulence associated factors identified so far in K. pneumoniae include lipopolysaccharides, several adhesins, and iron-acquisition systems (Simmons-Smit et al., 1986; Nassif and Sansonetti, 1986). The small numbers of known virulence-associated factors rather limit the possible targets for drug development, thus making the intervention of bacterial infection rather difficult.

(24)

Several strategies including in vivo expression technology, subtractive DNA hybridization, and signature-tagged mutagenesis have been adopted to identify virulence-associated genes in K. pneumoniae. These efforts have allowed the identification of many novel genes that might be important for the bacterium to infect humans. For instance, by using the in vivo expression technology, we have identified the presence of a plasmid-borne iron acquisition gene cluster in K. pneumoniae that is primarily expressed in the hosts (Lai et al., 2001). Nevertheless, further investigation of the functional roles of these novel sequences has been significantly hampered by the lack of the complete genome sequence of K. pneumoniae.

Most of the blood isolates of K. pneumoniae harbor a large plasmid of 200 kb in size (Peng et al., 1991). The plasmid has been demonstrated to contain the aerobactin siderophore biosynthesis genes and curing of the plasmid would result in an avirulent phenotype (Nassif and Sansonetti, 1989). In our laboratory, we also found that the loss of pLVPK, a plasmid of the similar size harbored in K. pneumoniae CG43, a highly virulent clinical isolate of K2 serotype (Lai et al., 2003) resulted in a loss of colony mucoidy, the ability to synthesize aerobactin, and a 1000-fold decrease of virulence. It is conceivable that the plasmid is likely to carry many additional virulence-associated genes and complete sequencing of the plasmid would hence be the most straightforward way for their identification. We herein report the 219-kb sequence and annotation of this large virulence plasmid from K. pneumoniae CG43.

(25)

1.2 Materials and Methods

Sequencing of pLVPK

The DNA of pLVPK was isolated from K. pneumoniae CG43 by using a Qiagen Plasmid Purification kit and fragmented by sonication. The DNA fragments were then resolved on a 0.7% low melting point agarose gel and DNA of size ranging from 2.0 to 3.0 kb were recovered, blunt repaired by Bal31 nuclease, and subsequently cloned into the pUC18 vector. A total of 2,304 clones were sequenced from both ends to achieve approximately 11-fold coverage of the plasmid. Sequences were assembled initially using the Phred/Phrap program (Ewing et al., 1998) with optimized parameters and the quality score was set to >20. When all the sequences assembled into 11 major contigs (>20 reads; >2 kb), the Consed program (Gordon et al., 1998) was then used for the final sequence closure (auto-finishing). Finally, several gaps among contigs were closed either by primer walking on selected clones, which were identified by analysis on the forward and the reverse links of each of the contigs, or by sequencing the DNA amplicons generated by PCR.

Gene prediction and Annotation

GLIMMER 2.02 (Delcher et al., 1999), a program that searches for protein coding regions, was used to identify those open reading frames (ORFs) possessing more than 30 codons. Overlapping and closely clustered ORFs were manually inspected. The predicted polypeptide sequences were used to search the protein database with the BLAST (NCBI database), and the clusters of orthologous groups of

(26)

proteins (COGs) database was used to identify families to which the predicted proteins were related. Mobile elements and repetitive sequences were identified using pair-wise comparison with the known insertion sequences. The presence of tRNA sequences was identified by the program tRNAscan-SE (Lowe and Todd, 1997). The G+C nucleotide composition analysis was made by GCWin of the G-Language package (Arakawa et al, 2003).

Drug susceptibility assay

Tellurite, copper, silver and lead susceptibility for the strains were determined essentially as described (Menoharan et al., 2003). E. coli, K. pneumoniae CG43, and its derivatives were propagated at 37 oC in Luria-Bertani (LB) broth. The overnight-grown cells were spread onto LB plates and the 3MM paper discs (5 mm diameter) impregnated with aliquots of a serial dilution of K2TeO3, CuSO4, AgNO3,

and Pb(NO3)2 solutions were placed on top of each of the plates. The plates were

then incubated at 37 oC for another 12 h and the inhibition zone measured. Iron acquisition activity was assayed using iron-deprived M9 plates (with 200 µM 2,2’-dipyridyl) and the paper discs impregnated with a serial dilution of FeCl3

solution. After spreading the overnight-grown bacteria onto the plates, the iron-loaded discs were then placed on top of each of the plates. The plates were incubated at 37 oC for 12 h and the growth zones around the paper discs were measured.

Nucleotide sequence accession number

The nucleotide sequences reported in this paper have been submitted to GenBank under the accession no. AY378100.

(27)

1.3 Results and Discussions

General overview

The entire DNA sequence consists of 219,385 bp forming a circular plasmid (Figure 1). The size and the predicted restriction enzyme cutting sites are consistent with the experimental findings using pulse-field gel electrophoresis. The plasmid contains 251 ORFs, as determined by the Glimmer program (Appendix I). The possible functions of these ORFs were subsequently analyzed by comparing the sequence to the current non-redundant protein database of the National Center for Biotechnology Information using BLAST software through the Internet. Approximately 37% of the 251 ORFs have significant amino acid sequence similarity (>60%) with the genes of known function in GenBank or with protein domains or motifs in protein databases. Despite their lack of homology to the known genes, the deduced amino acid sequences of 31% of the ORFs matched the hypothetical genes in the database. The remaining 32% had lower or no significant sequence similarities (<20%) with those in the database and their functions could not be assigned.

The average G+C content of the plasmid is 50.35%, which is somewhat lower than that of the K. pneumoniae MGH78578 genome (G+C = ~55%). The G+C content plotted along the pLVPK sequence with a window size of 1000 bp is shown in Figure 2. Four regions (Box 1~4) with a significant high G+C content in comparison with the average of the whole plasmid sequence were identified. The Box 1 consists of 9 ORFs showing 56~90% sequence similarity to an unknown gene cluster in Burkholderia fungorum genome. The second and third high G+C regions contain two iron

(28)

acquisition systems: iut and iro genes, respectively. The fourth box covered the lead-resistant pbr gene cluster and its nearby transposase gene. Two low G+C content regions are also marked in Figure 2, which include the two mucoidy regulator encoding genes, rmpA (34.6%) and rmpA2 (31.9%). The values of G+C at the third codon are even lower with 29.2% for rmpA and 28% for rmpA2.

Virulence-associated genes

The BLAST search revealed an 18-kb region, which is highly similar to the SHI-2 pathogenicity island (PAI) of Shigella flexneri (Moss et al., 1999). The SHI-2 like region includes the iron acquisition genes iucABCDiutA, vagCD, the unknown function ORF shiF, and rmpA2, a known virulence-associated gene in K. pneumoniae (Lai et al., 2003). Elsewhere the PAI-like region, a rmpA2 homolog, rmpA, and two additional gene clusters associated with iron metabolism were also found.

One interesting finding in pLVPK is the presence of rmpA and rmpA2, two genes encoding regulatory proteins for CPS synthesis in K. pneumoniae. CPS has been known to be a major virulence factor in K pneumoniae that protects the bacterium from the bactericidal activity of serum complements and macrophages (Simmons-Smit et al., 1986). The gene rmpA was first identified in K. pneumoniae as a determinant controlling the CPS biosynthesis (Nassif et al., 1989). The gene rmpA2, which was named because of its high similarity with rmpA, was identified later (Wacharotayankun et al., 1996). Since the major difference between these two gene products is that the RmpA2 has an extended N-terminal region, it has been generally thought that rmpA and rmpA2 are the same gene, and the rmpA reported earlier by Nassif and colleagues was a truncated form of rmpA2. Our sequencing result shows

(29)

in the 194 comparable amino acids), are actually two independent loci 29 kb apart (Figure 3a). Southern hybridization analysis of the plasmid using an rmpA2 probe also confirmed the presence of two copies of the gene (Figure 3b). The finding not only clarified that rmpA is not a part of rmpA2, but also demonstrated that both the genes are plasmid-borne. Our laboratory has recently found that RmpA2 protein directly interacts with the promoters of the K2 CPS biosynthesis genes through its carboxyl terminal helix-turn-helix motif-containing portion (Lai et al., 2003). Thus, we believe that RmpA could also interact with the cps gene promoter, although how it activates the cps gene expression and the interplay between these two Rmp proteins remain to be investigated.

The K. pneumoniae vagCD products exhibit 94% and 84% amino acid sequence identities with that of the VagC and VagD on pR64 of Salmonella enterica serovar Dublin. Like the vagCD of pR64, the two genes are also overlapped by one nucleotide. It has been proposed that VagC and VagD might be involved in the coordination of plasmid replication and cell division and disruption of the vagC locus would reduce the bacterial virulence (Pullinger and Lax, 1992). The high sequence similarity suggests that vagCD genes on the pLVPK also participate in the maintenance of the plasmid stability. Interestingly, the G+C content of the vagCD genes (~70%) is significantly higher than that of the rmpA2 (31.9%), which is located only 1.1 kb away, implying that rmpA2 and vagCD were recruited onto pLVPK independently.

Iron acquisition systems

The capability of iron acquisition is generally a prerequisite for a pathogen to establish infection when entering the hosts. In pLVPK, two siderophore-mediated iron

(30)

acquisition systems, iucABCDiutA and iroBCDN, were identified. The iucABCDiutA operon, which was first reported on pCoIV-K30 in E. coli (Ambrozic et al., 1998), consists of five genes responsible for synthesis and transport of the hydroxymate siderophore aerobactin. The presence of the aerobactin synthesis and utilization genes has also been reported for Salmonella, and Shigella spp., indicating that the genes are freely transferable within the Enterobacteriaceae. This notion is also consistent with the finding that the iucABCDiutA gene cluster is flanked by two transposable elements, IS630 and IS3, and 3’ sequences of E. coli K12 tRNALys and tRNATrp, which have

been proposed to play a role in the horizontal transfer of PAIs between bacterial pathogens (Hou, 1999).

The iroBCDEN gene cluster, first described in Salmonella enterica, is known to participate in the uptake of catecholate-type siderophores. Recently, similar gene cluster contained in a PAI was also found either on the chromosome or a transmissible plasmid in the uropathogenic E. coli (Sorsa et al., 2003). It should be mentioned that the iro gene cluster in pLVPK lacks iroE gene. Nevertheless, the absence of iroE gene probably would not affect the utilization of catecholate siderophore by the bacterium since it has been demonstrated in E. coli that an iroE mutation does not hinder the siderophore utilization activity (Sorsa et al., 2003).

A two-gene operon that encodes a ABC-type transporter related to Mesorhizobium loti FepBC was noted on pLVPK at nucleotide positions 77450..80256. The identity between the pLVPK genes and FepBC is 38% and 44%, respectively. These genes also share significant homology with many ABC transporters mediating translocation of iron, siderophores, and heme (Koster, 2001). Although the contribution of this putative ABC transporter in the uptake of iron

(31)

acquisition systems in order to obtain iron from the frequently changing environment. Finally, a gene cluster similar to E. coli fecIRA, which is responsible for regulating the uptake of ferric citrate in a Fe2+-Fur dependent manner was identified

approximately 3 kb upstream of the iroBCDN. In E. coli, fecIR genes are within a large gene cluster with fecABCDE that are the structural genes for iron citrate uptake and are thought to be the target of FecIR regulatory system (Braun et al., 2003). However, there is no observable fecABCDE homologs in pLVPK. This phenomenon is not that unusual. As shown in Figure 4, the homologs of fecIRA, but not fecBCDE, have been identified experimentally in Bordetella spp. as well as in several other bacterial species. It is not clear what the target genes are for these FecIRA-like regulatory systems in these bacteria (Braun et al., 2003). One possibility is that a fecABCDE gene cluster could be located on K. pneumoniae chromosome. Alternatively, the FepBC-like ABC-type iron transporter encoding genes on pLVPK could be the target gene of the FecIRA regulators.

It should be pointed out here that the pLVPK fecR open reading frame is disrupted by an in-frame termination codon. FecR is an inner membrane protein that senses whether FecA, the outer membrane ferric citrate receptor, is bound to the substrate, and in response activates FecI, which is known as a transcription factor. Deletion analysis of the fecR in E. coli has shown that a minimum of 59 amino acids in length of the FecR N-terminal derivative is still able to activate the FecI and subsequently a constitutive expression of the downstream target genes (Ochs et al., 1995). Thus, despite the presence of an internal stop codon, the fecR of pLVPK may still be capable of encoding a truncated but functional product and may result in a constitutive iron acquisition phenotype in K. pneumoniae CG43.

(32)

showed that the plasmid-cured strain, CG43-101, loses the aerobactin activity in comparison with its parental strain CG43. In addition, the iron acquisition activity assay revealed that CG43-101 apparently has a smaller growth zone around the iron-loaded disc. These results indicated that the iron-acquisition capability of the bacteria could mostly be attributed to the plasmid pLVPK.

Genes related to metal resistance

Heavy metals at certain concentrations in the cell may form unspecific complex compounds leading to a toxic effect. Many genes for the maintenance of the heavy metal ion homeostasis have been identified in bacteria. Three physically linked gene clusters, as shown in Figure 5 (152306..177234 bp), were identified in the pLVPK that are related to metal resistance phenotype in K. pneumoniae. These gene clusters include homologs of the lead-resistance genes pbrRSABC of Ralstonia metallidurans CH34 (Borrenmans et al., 2001), the copper-resistance genes pcoEABCDRS of E. coli plasmid pRJ1004 (Brown et al., 1995), and the silver-resistance gene cluster silCBAPsilRSE of S. enterica serovar Typhimurium (Gupta et al., 1999). By using disk diffusion assay, we have found that the resistance against silver and copper ions between K. pneumoniae CG43 and a plasmid-cured strain, CG43-101 remain the same.

A putative lead resistance gene cluster, pbrRABC, showed a 63~71% deduced amino acid sequence identity with that of the R. metallidurans pbrTRABCD genes. The R. metallidurans lead resistance operon, carried on a large plasmid, pMOL30, contains pbrT for Pb2+ uptake; pbrA, for Pb2+ efflux; pbrB for a putative integral membrane protein; pbrC for a putative prolipoprotein signal peptidase; pbrD that

(33)

(Borremans et al., 2001). Unlike that of the R. metallidurans, the pbr gene clusters of pLVPK contains only the efflux system (pbrABC) and regulator encoding genes (pbrR) (Figure 5), which suggest a simple lead-efflux mechanism similar to that of the CadA ATPase of Staphylococcus aureus and the ZntA ATPase of E. coli (Rensing et al., 1998). In contrast to the indifference of copper and silver ion resistance, the lead susceptibility increased in the disk diffusion assay after curing of the plasmid. The pbr genes in the pLVPK may contribute to the adaptation of K. pneumoniae in lead polluted human inhabitants.

A gene cluster encoding E. coli terZABCDE homolog was also identified. The terZABCDE has been shown previously to be a part of a PAI, which also contains integrase, prophage, and urease genes in E. coli EDL933 (Taylor et al., 2002). This gene cluster also provides the resistance to bacteriophage infection as well as resistance to pore-forming colicins. Although terBCDE are sufficient for the tellurite resistance property, the functions of each of these genes are unknown. The 14.7 kb region (19890..34588 bp) containing terZABCDE genes and 12 putative ORFs of pLVPK are comparable to the ter genes-containing region in the E. coli O157 genome. The homology is interrupted downstream of the terZABCDE region by an E. coli pTE53 tellurite resistance terF homolog and IS903 gene (Figure 6a). A recent study suggests that the Ter-containing pathogenicity island in enterohemorrhagic E. coli isolates was acquired from plasmid. With considerable degree of sequence homology (75~98% amino acid sequence similarity respectively with that of the E. coli O157 terZABCDE), the ter genes of the pLVPK are likely horizontally acquired. It has been speculated that the ter system most likely plays other functional roles such as protection against host defenses so as to be stably maintained in the bacterium (Taylor et al., 2002).

(34)

A chromosomally located ORF which showed 77% amino acid sequence identity with the E. coli tellurite resistant gene tehB (Taylor et al., 2002) has also been recently isolated in our laboratory from K. pneumoniae CG43. Deletion of the tehB-like gene had no apparent effect on tellurite resistance of the bacteria (Figure 6b) suggesting that the tellurite resistance of the bacteria is determined by the ter gene cluster of pLVPK rather than the tehB homolog.

Replication and plasmid maintenance

DNA sequence analysis also revealed a single plasmid replication region of 1,756 bp (217448..219203 bp), which consists of repA and sequence elements with characteristics of plasmid replicons that employ an iteron-based replication initiation and control mechanism (Chattoraj, 2000). The repA product showed a high sequence similarity to a number of plasmid replication initiation proteins, including RepFIB of Salmonella enterica serovar Typhi R27 plasmid (60% identity), RepFIB of E. coli O103:H2 (43% identity), RepA of Yersinia pestis KIM plasmid pMT-1 (42% identity), and RepA of S. enterica serovar Typhi plasmid pHCM2 (42% identity). As shown in the multiple sequence alignment in Figure 7a, RepA appears to be an initiator for plasmid replication, which is able to bind the flanking repeated sequences through its DNA binding structures, a winged-helix domain and a leucine-zipper motif (Chattoraj, 2000). We have also found two sets of iterons, four 21 bp and thirteen 42 bp direct repeats, located respectively at the upstream and downstream of the repA locus (Figure 7b). The sequences are most likely the specific binding sites for the RepA protein to initiate replication of the plasmid and also control the plasmid copy number (Chattoraj, 2000).

(35)

A region (203493..203994 bp) consisting of 11 copies of a 43-bp repeat (5’-gggaccacggtcccacctgcatcgtcgtttaggttttcagcct-3’), is believed to be required for segregation control of the plasmid. Next to the 43-bp direct repeat pattern are positioned the genes encoding sopA and sopB homologous. The organization is comparable to that of the sop operon which governs the partition of the F plasmid (Yates et al., 1999). In addition to sopAB, genes showing sequence similarity with parAB of E. coli P1 phage were identified. It has been shown previously that the corresponding partitioning site in the P1 parAB system is composed of direct or inverted repeats (Davis and Austin, 1988). We also noted that a 66-bp direct repeat upstream of the parAB homologs is found, which indicates that they also contribute to the partitioning control of the pLVPK. It is reasonable that such a large plasmid has meticulous maintenance systems. Nevertheless, how these two partitioning systems contribute to the maintenance of pLVPK remaine to be confirmed.

Heterogeneity

Pathogenic bacteria have obtained a significant proportion of their genetic diversity by acquisition of DNA from other organisms. Many of the gene clusters identified in pLVPK are homologous to the unknown gene clusters in the other organisms. Although with unknown functions, the homologs of the gene clusters contained in the 9 kb region from nucleotide 2522 to 11618 and the 5.9 kb region from nucleotide 13997 to 19886 were found respectively in the genome of Burkholderia fungorum and Yersinia pestis KIM. A gene cluster which encodes a putative ABC transporter system (117432..113670 bp) is also identified for which the deduced amino acid sequences are similar to those of the putative ABC transporter system of Streptomyces coelicolor A3. A region (46979..51336 bp) comparable to the

(36)

phage infection inhibition pif region of E. coli F plasmid was also identified. The boundary sequences of these gene clusters, as well as that of the PAI-like region, are mobile elements including insertion sequences and short pieces of 3’-sequences of tRNA genes. With the involvement of the transposons and the tRNA sequences, horizontal gene transfers have made possible these gene clusters to be introduced into the plasmid and hence affect the ecological and pathological characteristics of bacteria.

(37)

Chapter 2

Evolutionary analysis of the two-component systems in

(38)

2.1 Introduction

The two component system (2CS) is the means by which bacteria commonly

regulate an adaptive response to versatile environments. A 2CS often comprises of a

sensor histidine kinase and a response regulator (Stock et al. 1990). The sensor kinase

consists of at least one signal recognition (input) domain coupled to an autokinase

(transmitter) domain. Signals binding to the input domain cause activation of the

autokinase and thereby, hydrolysis of an ATP molecule to phosphorylate a conserved

histidine residue (Stock et al. 1989). The phosphate group is subsequently transferred

to the conserved aspartate residue at the receiver domain of a response regulator.

Most sensor kinases contain one input domain and one transmitter domain and are

hence called classical (IT-type) sensors. Some sensors contain both the sensor kinase

signature and a receiver domain of the response regulator and are thus referred to as

hybrid or ITR-type sensory kinases (Ishige et al. 1994). A smaller fraction of the

hybrid sensors possesses an additional output domain at the carboxyl terminus and are

referred to as ITRO-type or unorthodox sensor kinases. The response regulator, in

(39)

modulate an appropriate expression of the target gene (Parkinson 1993). Response

regulators other than transcription factors have also been reported. For example, the

CheY response regulator, after being phosphorylated by the kinase CheA, binds to a

flagella motor to promote a clockwise rotation of the flagella (Macnab, 1996).

The tight connection between the functionally coupled bacterial genes and their

chromosomal vicinity is a common feature of bacterial genomes (Overbeek et al.

1999; Dandekar et al. 1998). Most of the 2CS genes encoding functionally coupled

sensors and regulators are also physically linked as an operon in the genomes. Two

models, the co-evolution and recruitment models have been proposed to explain the

evolution of 2CS genes. The co-evolution model proposes that the majority of the

2CS genes in a genome have been aroused by gene duplication and a subsequent

differentiation of the ancestral 2CSs (Koretke et al. 2000). This is supported by the

fact that many of the coupled 2CS genes are concurrent in a genome. On the other

hand, the recruitment model suggests that some of the 2CS operons have evolved as a

result of an assembly of a sensor gene and a regulator gene from heterologous 2CSs.

Signal transductions between 2CSs encoded by distantly located genes have been

reported in the sporulation system of B. subtilis (Kobayashi et al., 1995) and E. coli

hybrid sensor kinases respectively (Hoch and Sihavy, 1995). It is conceivable that, in

(40)

operon would be beneficial for a coordinate control of the system.

Pseudomonas aeruginosa is a flexible Gram-negative bacterium that grows in a variety of environmental habitats. Patients with cystic fibrosis, burn victims, and

patients requiring extensive hospitalization are particularly at risk of P. aeruginosa

infections (Goldberg et al. 2000). The complete genome sequence of P. aeruginosa

PAO1 has been determined and published (Stover et al. 2000). The 6.3-Mb genome

contains 5,570 predicted genes, of which 123 2CSs were annotated according to the

most recently updated database of the Pseudomonas Genome Project. The number of

2CS genes in P. aeruginosa genome is relatively high in comparison with that in the E.

coli and Bacillus genomes, which is likely to be advantageous for the bacteria to adapt to different environments. Nevertheless, the function of approximately two-thirds of

the 2CS genes has not been characterized. In this study, we have performed analyses

of the phylogenetic relationship of the 2CS genes in P. aeruginosa PAO1 in the hope

(41)

2.2 Materials and Methods

Nucleotide Sequence Source and Sequence Analysis

The known and putative 2CS genes annotated by Pseudomonas aeruginosa

Community Annotation Project (PseudoCAP) were obtained from the web site

http://www.pseudomonas.com. The sequences of sensor kinase genes, hybrid sensor

genes, and response regulator genes were collected and processed into FASTA format.

Analysis of the 2CS was performed by homology search using the BLAST programs

provided by the National Center of Biotechnology Information through the Internet.

Multiple Sequence Alignment and Phylogenetic Estimation

Neighbor-Joining (NJ) trees built with the deduced amino acid sequences for

sensor kinases and response regulators were done by CLUSTAL W 1.81 (Tompson et

al. 1994). Default substitution matrix (Gonnet) was used for alignments, and the

positions with gaps were excluded in the tree construction. The resultant trees were

visualized by TreeView 1.6 (Page, 1996) and MEGA2 (Kumar et al. 2001).

For the maximum likelihood analysis, multiple sequence alignments of the amino

acid sequences of sensors and regulators from homologous gene clusters were

(42)

subsequently excluded manually using BioEdit 4.8.6 (Hall, 1999). Pair-wise distances

were analyzed by the PROML algorithm (with JTT amino acid change model) in

PHYLIP 3.6 (Felsenstein, 1993) and 1000 replications of bootstrap sampling were

performed for each analysis. Graphical representations of the multiple amino acid

sequence alignments, the sequence logos, are presented using WebLogo (Crooks et al.,

2004).

GC%, G+C Content in the 3rd Position of Synonymous Codons (GC3s), and The Effective Number of Codons Used in a Gene (Nc)

The GC% and GC% in the 3rd position of synonymous codons (GC3s) were

calculated using CodonW (Peden 1999). Nc, the measure of overall codon bias in a

gene, was calculated by using the CHIPS program with Wright's Nc statistic for an

(43)

2.3 Results and Discussions

Organization of 2CS encoding gene clusters

The 123 annotated 2CSs in the P. aeruginosa PAO1 genome, including 64 sensor

and 59 regulator genes, were chosen for this study. The discrepancy in the numbers of

sensor kinases and regulator genes as compared to the earlier reports of 64 and 63

sensor kinases and regulator genes respectively (Rodrigue et al. 2000) is most likely

due to the recent refinement of the annotation by the Pseudomonas Genome Project.

All these 2CS genes were first classified according to their relative location, gene

organization, and transcription orientation. As shown in Table 1, each sensor gene

was found to be located adjacent to a regulator gene by either direct linkage or

separated by less than 3 open reading frames (ORFs) except for 14 sensor genes

which were assigned as orphan sensors in Group IV. The most common type of gene

organization as represented by the 29 2CS gene pairs in Group I is that the regulator

gene was located upstream to the sensor gene. Two 2CS gene clusters within this

group contained an additional 2CS gene (a sensor gene in Group Ib, and a regulator

gene in Group Ic), which is transcribed in an opposite direction to that of the paired

regulator and sensor genes. The Group Id contains 4 gene clusters with one or three

(44)

Group II contains 16 pairs of 2CS genes with the gene order of sensor followed by

regulator. There are four 2CS gene clusters in Group III, where the regulator and the

sensor genes are transcribed divergently. The rest of the 2CS genes, including 14

sensors and 8 regulator genes, are not physically linked to any 2CS gene and were

hence referred to as orphan sensors and regulators respectively.

Analysis of the 2CS genes based on functional motifs

These 2CS genes were further analyzed on the basis of the functional motifs of

their gene products. The average length of the response regulator genes was

approximately 850 bp. Twenty four of the regulators are members of the OmpR

transcription factor family, which forms the largest group of the 2CS response

regulators in P. aeruginosa PAO1. Apart from the 11 NarL-, 8 NtrC- and 5

CheY-type regulators, the rest of the 11 regulators with signal-receiving motifs, were

found not to contain the conserved C-domain for classification and therefore were

listed as unclassified (Table 1).

In contrast to that of the regulator genes, the size of the sensor genes varies greatly,

ranging from 650 bp to 7418 bp. Classification of the 64 sensor genes was as follows

(45)

PA0471, which encodes the iron sensor Fur, is the only unclassified sensor gene

(Table 1).

Combining the analysis of gene organization and the structural motifs of these

gene products, several interesting features were noted as follows:

(1) Almost all (20 out of 23) of the Group Ia gene clusters carry an OmpR-like

regulator, and most of the OmpR-like regulators (22 out of 24) have accompanying

classical (IT) sensor genes located adjacently downstream. This observation indicates

that the gene order of a regulator-to-classical sensor is preferred by the family of

OmpR-like regulators. It is likely that most of the regulator-sensor pairs in Group Ia

were co-evolved by duplication from an ancestral OmpR-IT pair so that the gene

organization remained unchanged. Moreover, 15 out of the 20 OmpR-IT 2CS gene

clusters consist of the regulator gene overlapped with the downstream sensor gene

which also supports the co-evolution model. In the E. coli K12 MG1655 genome, 11

of the 14 2CSs of the OmpR-like family exert the gene order of regulator-to-sensor

according to the KEGG database (Kanehisa et al., 2002). A similar phenomenon is

also observed in the genome of Bacillus subtilis, where all of the 14 2CS genes of the

OmpR-family are in a regulator-to-sensor organization (Fabret et al., 1999). It is

likely that most of the 2CS gene clusters of the OmpR-family have originated from a

(46)

bacteria.

(2) The NarL-like regulator genes classified in either of the Group I, II, III or IV

appeared to link to the corresponding genes of either IT-, ITR-, or ITRO-type sensors

suggesting a different strategy from co-evolution. Instead, they are probably recruited

components during evolution.

(3) 10 out of 11 ITR-type hybrid sensors are orphans. An exception is PA1396,

which is located next to a NarL-like regulator PA1397 in a divergently transcription

orientation. The ITR-type sensor, also referred to as the hybrid sensor kinase, contains

a regulator-like receiver domain following the input and transmitter domains.

Interestingly, most of the ITRO-type sensors, which carried an additional Hpt

(histidine-containing phosphotransfer) domain in comparison with that of the

ITR-type sensors, are adjacent to a response regulator. It has been demonstrated that

the phosphorelay specificity of the ITRO-type sensors, such as the BvgS of Bordetella

pertussis and EvgS of E. coli, was determined by the Hpt domain (Perraud et al., 1998). The phosphorelay between the ITR-type kinases and the corresponding

response regulators in Vibrio harveyi and Saccharomyces cerevisiae also occurred

through Hpt modules, which are however encoded by genes distantly located to the

(47)

2000). It is likely that without a combined Hpt domain, the ITR sensors act in concert

with the Hpt modules to perform the multiple-step phosphorelay in a manner similar

to that of the ITRO systems. It has been proposed that a receiver or receiver-Hpt

domain may be fused to an IT-type kinase to yield hybrid or unorthodox sensor

kinases (Grebe and Stock, 1999). Recruitment of these domains may confer on these

systems an additional flexibility as compared to the classical two-component signal

transduction.

Phylogenetic analysis of the 2CS genes

The evolutionary relationships among the sensor and regulator genes were

estimated by multiple sequence alignment of their deduced amino acid sequences

using CLUSTAL W followed by the neighbor-joining method of tree construction.

The 2CSs including four CheA-type sensors (PA0178, PA0413, PA1458, PA3704), 5

sensors (PA1396, PA3078, PA3878, PA4197, PA5262, PA0471) with extraordinary

length, and one unclassified regulator PA4843 were poorly aligned with the rest of

sensors and thus were excluded from the NJ tree construction.

As shown in Figure 8A, the ITR- and ITRO-type sensors apparently formed a

group of sub-trees except for the branches h1 and h2. Their close association in the

(48)

However, only 4 ITRO- and 1 ITR-type sensors were observed in the 49

sensor-regulator pairs. The question of why the multi-step phosphorelaying system is

not favored in the bacteria remains to be answered.

Most OmpR-like and NtrC-like regulator encoding genes were found to form a

cluster in the tree (Figure 8B), whereas genes encoding the members in the NarL

family and the unclassified regulators were scattered in the tree indicating a lower

sequence similarity. In order to analyze the historical associations between sensors

and regulators, each node in the sensor tree was first assigned an association with the

cognate regulator. Each terminal node of the sensor tree was therefore represented by

its cognate regulator, while each of the internal nodes was represented by the union of

the cognate regulators of all the descendants of that node. For each node in the sensor

tree, the corresponding node in the regulator tree with the same descendent regulators

was searched for. Subsequent to this, six clades (congruent monophyletic groups)

composed of 11 sensor-regulator pairs designated as clade A to F respectively were

identified (Figure 8A, 8B). As shown in Table 2, the distances of the 2CS pairs in

each clade calculated based on PRODIST were apparently shorter in contrast to those

of the 2CS pairs containing NarL-like regulators. The 2CS pairs in each of the clades

(49)

co-evolution of the 2CSs in 20 different genomes (Koretke et al., 2000).

In order to further assess the co-evolution relationships, maximum-likelihood (ML)

estimation of the phylogeny was subsequently carried out for the specific groups of

OmpR-, NtrC-, and NarL-like regulator-containing 2CSs (Figure 9). As shown in

Figure 9A and 9B, the 2CS pairs of OmpR and NtrC families also form congruent

clades whereas, the ML trees for the 2CS pairs of NarL-like regulators appeared to

show different topologies. In Figure 9C, the PA1397 in the regulator tree is only one

branch away from PA3879 (narL), however, their corresponding sensors PA1396 and

PA3878 (narX) are distantly located from each other. This is supportive to the

recruitment model that the 2CS pairs are assembly products of a sensor and regulator.

It is consistent with the distance-based analysis that the distances between these

NarL-group 2CS pairs are relatively long or beyond determination (Table 2).

To measure the dissimilarities between the sensor and regulator trees, the resolved

and different quartets were determined (Estabrook et al., 1985). For trees of the 23

sensor and regulator pairs of the OmpR- group, the quartet dissimilarity is 4,251. For

the 8 sensor - regulator pairs of NarL- and NtrC-groups, the values are 41 and 31

respectively. To compare the tree dissimilarities of different groups, random trees

with the same number of OTUs are generated and thence, the dissimilarities measured.

(50)

trees. Fewer than 3.57% and 5.65% of the random trees have smaller quartet

dissimilarties than the data of the OmpR- and NtrC- groups respectively. In contrast,

fewer than 19.49% of the random trees have smaller quartet dissimilarities than that

calculated for the NarL-group (Figure 10). The results conclusively show that the tree

congruencies of the OmpR- and NtrC- group are relatively higher than that of the

NarL- group.

Analysis of the sequences around the phosphorelated histidine of the sensor kinases It has been shown in the B. subtilis 2CSs that, classification of the sequences

flanking the histidine in the kinase could be correlated to their cognate regulators

(OmpR, NarL, etc.) (Fabret et al., 1999). The sequences around the phosphorylated

histidine residue were used to further classify the sensor kinases. We have found that

the histidine containing motifs could be classified in to three homologous groups and

their sequence logos are as shown (Figure 11). The 4 CheY-type regulators were

paired with the Class I kinases. Seven out of nine NtrC-type regulators were paired

with the Class II kinases. The two exceptions were PA5484 with a regulator-to-sensor

transcription order, and PA4293 with two ORFs located in between the regulator and

(51)

were paired with the Class II and III sensors, most of the others were paired with the

sensors showing low sequence similarity around the histidine residue. This is

supporting evidence to the hypothesis that sensors paired with their cognate regulators

of the NtrC- or OmpR-types have co-evolved as a unit from a common ancestor.

Functional analysis of the most recently duplicated 2CS sensor-regulator pairs

In order to assess functions of the 2CSs identified in each of the congruent clades,

we compared these 2CSs with those of the known functions identified in other species

and also their adjacent genes with the characterized properties. Several interesting

findings were noted:

(1) The 2CS genes in one clade may contain a similar function. For instance, in

clade A, the two 2CS gene pairs pirS (PA0930)/pirR (PA0929) and pfeS

(PA2687)/pfeR (PA2686) are parts of the operons pirRSA and pfeRSA, respectively

(Figure 12A). Both operons encode siderophore-mediated iron uptake systems and are

under the control of the Fur protein (Ochsner and Vasil, 1996). This indicates that the

paralogous groups continue to carry out a similar function after gene duplication.

Functional redundancy is also seen in the members of clade C: PA3044/PA3045 and

PA3946/PA3948, which are likely virulence-related 2CS paralogs (Figure 12B). Both

(52)

parapertussis bvgAS that has been demonstrated to participate in regulating the synthesis of many virulence factors (Bock and Gross 2001). Moreover, the regulator

gene PA3947 has been reported to encode a homologue with a 45% sequence

similarity to Vibrio cholerae virulence-related protein VieA (Lee et al., 1998) and to

the regulator PvrR that controls antibiotic susceptibility and biofilm formation in P.

aeruginosa PA14 (Drenkard and Ausubel 2002). The clustering structure suggests a related function since gene clusters in a bacterial genome may possess the same

function (Overbeek et al., 1999).

(2) Gene rearrangement may have occurred after duplication of the co-evolved

2CS gene cluster. The virulence-associated 2CS gene PA0928 (lemA) is adjacent to

the pirRSA operon (Figure 12A). In the Pseudomonas syringae genome, the lemA

gene is clustered with the cysteine synthase encoding gene cysM (PA0932) in a

divergently transcriptional direction suggesting that the region has been subject to

gene rearrangement during speciation of P. syringae and P. aeruginosa. PA0928,

which resides relatively distant to the sensor PA0930 in the tree, appears to be

recruited later by the 2CS pair PA0929/0930. As shown in Figure 12B, the 2CS pair

PA3045 and PA3044 in clade C are co-transcribed. However, the PA3946 and

數據

Table 2. The collective distance estimated for some of the 2CS pairs

參考文獻

相關文件

Many grow through life mentally as the crystal, by simple accretion, and at fifty possess, to vary the figure, the unicellular mental blastoderm with which they started. The value

The resulting color at a spot reveals the relative levels of expression of a particular gene in the two samples, which may be from different tissues or the same tissue under

Students are asked to collect information (including materials from books, pamphlet from Environmental Protection Department...etc.) of the possible effects of pollution on our

&#34;Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values,&#34; Data Mining and Knowledge Discovery, Vol. “Density-Based Clustering in

Wang, Solving pseudomonotone variational inequalities and pseudocon- vex optimization problems using the projection neural network, IEEE Transactions on Neural Networks 17

Define instead the imaginary.. potential, magnetic field, lattice…) Dirac-BdG Hamiltonian:. with small, and matrix

 The class of languages decided by polynomi al-time algorithms 是 the class of languages accepted by polynomial-time algorithms 的 su bset.. G=(V,E) is a simple cycle that contains

Microphone and 600 ohm line conduits shall be mechanically and electrically connected to receptacle boxes and electrically grounded to the audio system ground point.. Lines in