• 沒有找到結果。

Chapter 3 Changed Epitopes Drive the Antigenic Drift for Influenza A (H3N2) Viruses

3.4.2. Changed epitopes for antigenic variants

Currently, several methods measured a changed epitope to escape from neutralizing antibody [9].

Here, we utilized the degree of accumulated mutations within an epitope to evaluate a changed epitope according to 329 positions and 64 selected positions. Figures 3.2 and 3.3 show the relationships between changed epitopes and antigenic variants on 4 models.

0

Figure 3.2 The relationships between number of changed epitopes and antigenic variants based on four proposed models. (A) The first model considered an epitope as changed if there is at least one mutation within it. (B) The second model considered an epitope as changed if there are at least two mutations within it. (C) The third model considered an epitope as changed if there are at least two critical mutations within it. (D) The fourth model was derived from model three and further defined "1+" type if there are at least 2 and 3 critical mutations in epitope A and B.

respectively.

A

Figure 3.3 The changed-epitope composition and antigenic variants on 4 models. (A) Model one.

(B) Model two (C) Model three and (D) Model four.

Models one and two: Changed epitopes on 329 positions

Figures 3.2A (Model one) and 3.2B (Model two) show the relationships between number of changed epitopes and "antigenic variants" on 343 pair of HA sequences with HI assays. Among these 343 pairs for Model one, the changed epitopes of 225 "antigenic variants" pairs range from 1 to 5 and the changed epitopes of 118 "similar viruses" pairs range from 0 to 5. Among 34 similar viruses with more than 4 changed epitopes for Model one, we observed the following results: (1) the average number of changed epitopes was 4.2; (2) the average number of changed epitopes with only one mutation was 2.02 and 33 pairs have more than one changed epitope with only one mutation. For example, the virus pair, A/PortChalmers/1/73 and A/Singapore/4/75, has four changed epitopes with one mutation (i.e. Epitopes A, C, D, and E) (Table 3.3). In general, these 34 similar viruses should be regarded as "antigenic variants" because there are more than four changed epitopes. This result shows that the Model one is not reasonable.

For Model two, the average number of changed epitopes was 2.2 for these 34 similar viruses.

According to the distribution (Fig. 3.2B), Model two achieved the highest accuracy if more than two changed epitopes was considered as "antigenic variants". The accuracies were 74.9%

(257/343) and 92.2% (29410/31878) for predicting antigenic variants on the training set and independent set, respectively. This result was similar to the previous work [9].

Model three: Changed epitopes on 64 selected positions

Model three considered a changed epitope when the number of mutations on the 64 selected critical positions is more than 2. In Model two, the numbers of "antigenic variants" and "similar viruses" with ≥ 3 changed epitopes were 119 and 16, respectively (Fig. 3.2B). The averages of changed epitopes with ≥ 2 mutations on 329 positions for "antigenic variants" and "similar viruses" were 3.8 and 3.2, respectively. The averages of changed epitopes with ≥ 2 mutations on 64 selected critical positions for "antigenic variants" and "similar viruses" were 3.2 and 1.5, respectively (Fig. 3.2C). These observations show that Model three using mutations on 64 critical positions is better than Model two to discriminate "antigenic variants" from "similar viruses". For the "similar viruses", A/Alaska/10/95 and A/France/75/97, there are 12 mutations to drive zero changed epitope because no epitope with ≥ 2 mutations on selected 64 positions

Three HA/antibody complex structures [10] can be used to provide structural evidences for the changed epitopes (Fig. 3.4). Among these complexes, two antibodies bind to epitopes A and B (PDB code 1KEN [58] and 2VIR [59]), while the third binds to epitopes C and E (PDB code 1QFU [60]). The antibodies consistently bind to two epitopes and this result agrees to Models two and three. HA/antibody structures and Models two and three show that two position mutations often induce the conformational change of an epitope to escape from the antibody recognition. However, the numbers of changed epitopes of 48 "similar viruses" pairs are 2 (35 pairs) and ≥ 3 (16 pair) for Model two (Fig. 3.2B). Conversely, 14 "similar viruses" pairs have more than 2 changed epitopes for Model three (Fig. 3.2C).

A B C

Figure 3.4 The three HA-antibody complex structures. PDB codes are (A) 1KEN [58] (B) 2VIR [59] and (C) 1QFU [60]. All of the three structures of antibodies bind on two epitopes on HA by heavy chain (pink) and light chain (green). The five epitopes on HA are labelled (Epitope A in red; B in purple; C in orange; D in cyan; E in green).

Model four

Among 72 "antigenic variants" pairs with one changed epitope based on Model three, 70 pairs change on epitopes A or B. The single changed epitope on A or B, which can cause "antigenic variants", agreed to HA/antibody complex structures and the experiments. The receptor-binding site, surrounded by epitopes A and B, is a basis for HA for the neutralizing mechanism [58, 61]

(Fig. 3.1B).

Based on this observation, the epitopes A and B play a key role for neutralizing antibodies.

Model four based on Model three considered a pair of HA sequences as "antigenic variants"

when ≥ 2 changed epitopes or ≥ 1 changed epitope on A or B. In Model 4, a pair of HA sequences with ≥ 3 mutations on 64 critical positions for the epitope B is regarded as "antigenic variants". Thus, we annotated a virus-pair with single changed epitope on A or B as "1+" type (Fig. 3.3D). For example, the pair, A/Guizhou/54/89 and A/Beijing/353/89, occurs the changed epitope on A (i.e. mutation positions 135, 144 and 145) (Table 3.3). The accuracies of Model four were 81.6% and 94.0% on the training set and independent set, respectively. This model outperformed two compared methods, i.e. Wilson & Cox (89.7%) [9] and Lee & Chen (92.4%) [40], on the independent data set (Fig. 3.5).

73%

76% 74% 75% 77%

82%

90% 92%

90% 92% 92% 94%

50%

60%

70%

80%

90%

100%

Wilson &

Cox, 1990

Lee & Chen, 2004

Model One Model Two Model Three Model Four

A ccu racy

343 Pairs 31,878 Pairs

In the HA/antibody structure complex (PDB code 1KEN [58]), the antibody binds on epitopes A and B using two CDRs (i.e. CDR1 and CDR3) on the heavy chain and one CDR (i.e.

CDR2) on the light chain (Fig. 3.6). The interface of antibody and HA consists of 13 and 5 contacted residues locating on epitopes B and A, respectively. Among these 13 positions, 7 positions were selected as critical positions. Based on Model four, 46 "antigenic variants" pairs have one changed epitope B with 3 mutations on epitope B, denoted as "B+". This result suggested a single changed epitope B can cause antigenic variants. For example, the pair virus strains, A/NewYork/55/2004 and A/Anhui/1239/2005, have three critical mutations on epitope B (i.e. positions 156, 160 and 193) (Table 3.3). According to the HA/antibody structure (Fig. 3.6), the residue 156 interacts to CDR2 (position 55 on the antibody) and the residue 193 interacts with three residues on CDR2 (positions 50, 55 and 57) and one residue on CDR3 (position 105).

This structure suggested that mutations on residues 156, 160 and 193 can induce the conformation change on epitope B to escape from CDR2 and CDR3 of the neutralizing antibody.

CDR1 CDR3

CDR2

193 189 159 156

Antibody

HA

Figure 3.6 The HA/antibody structure and interface. (A) The antibody and HA trimer (PDB code 1KEN [58]). (B) The interface of the antibody and HA. The critical positions on epitope B and the CDRs of the antibody are labelled.

0 1 2 3 4

1982-1983 1983-1984 1984-1985 1985-1986 1986-1987 1987-1988 1988-1989 1989-1990 1990-1991 1991-1992 1992-1993 1993-1994 1994-1995 1995-1996 1996-1997 1997-1998 1998-1999 1999 1999-2000 2000 2000-2001 2001 2001-2002 2002 2002-2003 2003 2003-2004 2004 2004-2005 2005 2005-2006 2006 2006-2007 2007 2007-2008 2008

HD

0000 1982-1983 0000 1983-1984 0000 1984-1985 0000 1985-1986 0000 1986-1987 0000 1987-1988 0000 1988-1989 0000 1989-1990 0000 1990-1991 0000 1991-1992 0000 1992-1993 0000 1993-1994 0000 1994-1995 0000 1995-1996 0000 1996-1997 0000 1997-1998 0000 1998-1999 0000 1999 0000 1999-2000 0000 2000 0000 2000-2001 0000 2001 0000 2001-2002 0000 2002 0000 2002-2003 0000 2003 0000 2003-2004 0000 2004 0000 2004-2005 0000 2005 0000 2005-2006 0000 2006 0000 2006-2007 0000 2007 0000 2007-2008 0000 2008

Variant Ratio

Figure 3.7 The epitope evolution and the antigenic drift from 1982-1983 to 2008 influenza season. (A) The distributions of variant ratios of WER strains from 1982-1983 to 2008 season.

The match between Model four and WER are labelled (Match in red arrow; Not match in blue arrows). (B) The average hamming distances (HD) of 5 epitopes from 1982-1983 to 2008.

Antigenic drift and epitope evolution

We utilized the changed epitopes to study the antigenic drift on 2,789 circulating strains ranging from influenza season 1982-1983 to 2008 (36 influenza seasons). One of WHO surveillance network's purposes is to detect the emergence and spread of antigenic variants that may signal a need to update the composition of influenza vaccine [14-15]. Here, we considered an emerging antigenic variant according to WER strain, which was the dominant strain in each influenza season [34] (Table 3.1). For a selected season, we applied Model four, measuring changed epitopes for the pairs between the vaccine and circulating strains for "antigenic variants", and the

Among 36 influenza seasons, our model detected 12 seasons with emerging antigenic variants (VR ≥ 0.5) and 10 of them followed by the update of WER strain in the next season (Fig.

3.7A). For example, the 1885-1986 season, 80% of the circulating strains with changed epitope

"B+" (Fig. 3.7B), is the first emerging antigenic variants and the WER strain updated in the next season (i.e. from A/Mississippi/1/85 to A/Leningrad/360/86). Moreover, among seven "emerging antigenic variants" seasons (matching WHO vaccine updates), four seasons (i.e. 1989-1990, 1991-1992, 1995-1996 and 2002-2003) matched the antigenic cluster transitions proposed by Smith et al. [15]. The other three seasons, which were detected by one changed epitope on A or B, are consistent to the WER strain updates (i.e. 1985-1986, 1987-1988 and 1999). These observations suggested that "emerging antigenic variants" with ≥ 2 changed epitopes may cause the major antigenic drift while "emerging antigenic variants" with one changed epitope on A or B may cause the minor antigenic drift.

To observe the epitope evolution, Figure 3.7B illustrates the hamming distance (HD) on 64 critical positions of all the five epitopes. For example, the VR of the season 1985-1986 was 0.8 (Fig. 3.7A) and the epitope with the largest HD was epitope B (HD is 3.4). For 15 seasons with WER strain updates, the average HD of epitopes A, B, C, D and E were 1.2, 2.1, 0.5, 0.4 and 0.4 respectively. These observations showed that epitopes A and B change more frequently in vaccine update seasons and play a key role for the antigenic drift.

0.0 0.5 1.0

0000 1982-1983 0000 1983-1984 0000 1984-1985 0000 1985-1986 0000 1986-1987 0000 1987-1988 0000 1988-1989 0000 1989-1990 0000 1990-1991 0000 1991-1992 0000 1992-1993 0000 1993-1994 0000 1994-1995 0000 1995-1996 0000 1996-1997 0000 1997-1998 0000 1998-1999 0000 1999 0000 1999-2000 0000 2000 0000 2000-2001 0000 2001 0000 2001-2002 0000 2002 0000 2002-2003 0000 2003 0000 2003-2004 0000 2004 0000 2004-2005 0000 2005 0000 2005-2006 0000 2006 0000 2006-2007 0000 2007 0000 2007-2008 0000 2008

Variant Ratio

Model four Wilson & Cox Wilson & Cox & 64 A.A.s

MI8 5

LE86 BE92SD93 BR07

SH87

SI87 WU95

SY97

FU02 CA04 WI05

PH82 BE89 JO94 MW99

Influenza Season

WER strains

Figure 3.8 The comparison between our method and Wilson & Cox's model [9] in the antigenic drift from 1982-1983 to 2008 influenza season.

Table 3.5 Example of 13 antigenic variants without changed epitopes Fujian/140/2000 Chile/6416/2001 V S 12 144 186, 194 273 226, 246, 247

Hong_Kong/1/94 Guangdong/25/93 V S 8 124 47, 299 96, 216, 219, 226 92

Panama/2007/99 Chile/6416/2001 V S 7 144 186 246

Wellington/1/2004 Singapore/68/2004 V S 9 145 189 50 226, 227 94 Wellington/1/2004 Victoria/513/2004 V S 5 145 186, 190 167, 226 Wellington/1/2004 Wisconsin/19/2004 V S 5 138, 145 186 278 226

Fujian/140/2000 NewYork/55/2001 V V 12 144 186, 194 273 226, 229, 247

Victoria/3/75 Victoria/112/76 V V 2 229

1 the antigenic type of virus B relative to antisera against virus A.

2 the antigenic type of virus A relative to antisera against virus B.

3.5. Discussion

Based on the accumulated HI assays from 1968 to 2008, we identified 64 critical positions on HA. Among the 64 critical positions, we observed that 10 positions are not located on all the five epitopes. Furthermore, 4 of the 10 positions were almost conserved from 1968 to 2000 and underwent frequency switch [41] after year 2000 (positions 25, 202, 222 and 225). These new emerging positions suggested that the previously conserved positions may become new antibody binding sites and IG can identify new emerging positions from HI assay. Moreover, the emerging mutations also revealed a need to update the epitope definition that proposed before 1999 [9, 31].

According to the distribution of antigenic variants of Model four (Fig. 3.3D), it is interesting that the main samples (209/225 pairs) of the antigenic variants have the changed epitope on epitopes A or B. In addition to this, among the 85 pairs of antigenic variants which had one or no changed epitopes, we observed that 56 pairs of them had ≥ 3 mutations in epitope A and B. These observations suggested that we may consider the epitopes A and B as one epitope and sum up the mutations in epitope A and B since these two epitopes were close to the receptor-binding site.

Furthermore, many experiments suggested that the occlusion of the receptor-binding site by antibodies bound to the HA molecule forms the dominant neutralizing mechanism [58, 61].

Wilson & Cox [9] suggested that a viral variant usually contains more than 4 residue

that 215 of them match Wilson and Cox's model. However, we also observed that 83 of the 118 similar viruses match their observation, which implied that their model had little ability to discriminate between antigenic variants and similar viruses. We also applied Wilson & Cox's model to detect the antigenic drift from seasons 1982-1983 to 2008 (Fig. 3.8). Among 36 influenza seasons, their model could detect 25 seasons with emerging variants (VR ≥ 0.5) and 15 of them followed by the update of WER strain in the next season. Furthermore, their model detected only one season without emerging antigenic variants from seasons 1982-1983 to 1999, which suggested that their model had less ability to discriminate between seasons with and without emerging variants. We further compared our model with Wilson & Cox's model and we found that the major difference was due to the critical positions. If we applied Wilson & Cox's model only to the 64 critical positions, the false positive decreased from 10 seasons to only 3 seasons, which suggested that these critical positions were crucial for antigenic drift.

Gupta et al. proposed an antigenic distance between vaccine strain and circulating strains [30]. The proposed distance quantitatively measured the degree of change in the "dominant epitope", which was the epitope having largest fractional change in protein sequence. They further correlated the antigenic distance with the vaccine efficacies of 19 influenza A (H3N2) vaccines from 1971 to 2004. Among the 19 comparisons of vaccine strain and circulating strain, 13 of them were composed of different vaccine and circulating strain. It was interesting that all 13 pairs of them had dominant epitopes A or B and this suggested that the epitopes adjacent to the receptor-binding site were crucial for the antigenic drift. In addition, our model had been validated on 2,789 circulating strains while the proposed antigenic distance had been only validated only on 19 circulating strains.

Among 225 "antigenic variants" pairs, 13 pairs have no changed epitopes (Table 3.5). 11 of the 13 pairs have contradicting antigenic types by two antiseras, which suggested a more powerful experimental assay is required to verify the antigenic types. For example, the antibody against the A/Alaska/10/95 strain can't inhibit the A/Idaho/4/95 strain; while the antibody against the A/Idaho/4/95 strain inhibits the A/Alaska/10/95 strain.

The HA is a trimer protein and each subunit includes two chains, HA1 and HA2 [62].

Recently, Ekiert et al. identified an antibody that bound a new epitope in the stem of HA1 and HA2 chains [63]. The antibody blocks the conformational changes which are required for the fusion of viral membrane. This new epitope consists of two chains on HA and it can be regarded as two epitopes, this also implies that two changed epitopes can escape the neutralizing antibody.

3.6. Summary

This study demonstrates our model is robust and feasible for quantifying the changed epitopes.

According to the distribution of antigenic variants in HI assays and HA/antibody complex structures, we found that two critical position mutations with high genetic diversity and antigenic scores can induce the conformation change of an epitope. Epitopes A and B, closing the receptor-binding site of HA, play a key role for neutralizing antibodies. Furthermore, two changed epitopes often drive the antigenic drift and can be used to explain the WHO vaccine strain selection. We believe that our method is useful for the vaccine development and understanding the evolution of influenza A viruses.

Chapter 4

A Bayesian Approach for Quantifying the Antigenic Distance of Influenza A (H3N2) Viruses

4.1. Introduction

Influenza viruses often cause significant human morbidity and mortality [23]. The viral surface glycoproteins, HA and NA are the primary targets of the protective immune system. The viruses are able to continual evade host immune system through the accumulated mutations on the HA to change its antigenic properties through the time. The degree to which immunity induced by one (e.g. vaccine) strain is effective against another (e.g. circulating) strain is mainly dependent on the antigenic difference between two strains [14]. Thus, studies of antigenic difference among strains are important for the vaccine strain selection and many methods have been proposed to study the antigenic drift and vaccine development [15, 25, 31-32].

Among the sequence-derived methods measuring the antigenic difference crossing strains, hamming distance (HD) is one of the well-known methods. It counts the number of mutations between pairs of sequences and considers all amino acid positions as antigenic equivalents.

However, not all positions on HA are on surface or on antibody combining sites that are recognizable by antibodies [9]. Moreover, some functional sites are evolutionarily conserved (e.g.

serine 136 and tyrosine 98 in receptor-binding site) [11]. Furthermore, Smith et al. demonstrated that, "Antigenic evolution was more punctuated than genetic evolution, and genetic change sometimes had a disproportionately large antigenic effect". [41].

4.2. Motivation and aim

Based on the experimental results from previous two chapters, we observed that some positions were crucial for antigenic variants (e.g. position 145) while other positions had few effects for antigenic variants (e.g. position 226). We also noticed that mutations on epitope A and B seem more likely to cause antigenic variants. The above observations raise the question of whether the amino acid positions are antigenically equivalent or not.

Here, we proposed a Bayesian approach [64-66] to identify the antigenic drift of influenza A by quantifying the antigenic effect of each amino acid position on HA. We utilized the likelihood ratio (LR) to quantify the antigenic distance of an amino acid position. Based on naïve Bayesian network and LR, we developed an index, ADLR, to quantify the antigenic distance of a given pair of HA sequences. Our experimental results show that the positions located on the epitopes and near the receptor-binding site are crucial to the antigenic drift. In addition to this, the ADLR

values are highly correlated to the HI assays and can explain WHO vaccine strain selection from 1968 to 2008.

Figure 4.1 Overview of our method for quantifying the antigenic distance for amino acid positions and a pair of HA sequences. (A) The flowchart of training set preparation (B) The calculation of LR for a amino acid position to quantify its antigenic distance (C) The calculation of ADLR for a pair of HA sequences to quantify their antigenic distance based on naïve Bayesian network [64-65].

4.3. Materials and methods

Figure 4.1 show the overview of our method for quantifying the antigenic distance for amino acid positions and a pair of HA sequences by calculating the likelihood ratio (LR) and ADLR

based on naïve Bayesian network [64-65].

4.3.1. Data sets

HI assays

For H3N2 virus, the HI assay data have had accumulated almost 40 years since 1968, which were selected in this study to quantify the antigenic distance of amino acid positions. We first collected influenza H3N2 virus HI assays from Weekly Epidemiological Record (WER) [Table 4.1], public documents from World Health Organization (WHO) collaborating center [Table 4.1]

and publications [Table 4.1]. Then, we searched the H3N2 viruses with HI assays for their HA sequences in influenza virus resource [36] and influenza sequence database [35]. The number of collected HI assays with HA sequences available is 636 pairs and a subset of 343 pairs with 125 HA sequences were selected as a training set. In the training set, 183 pairs and 106 pairs of them are collected from WER and WHO collaborating center, respectively. The main samples (72%, 249 pairs among 343 pairs) consisted of pairs of vaccine-circulating strains and for each pair it was known whether there is inhibition of the circulating strain by antibodies against the vaccine strain ("antigenic variants" and "similar viruses"). Vaccine strains selected by WHO and are often the dominant strains of influenza seasons. Each pair includes a HI assay value (i.e. antigenic distance) and a bit string with 329 binary bits by aligning a pair of HA sequences (329 amino acids). For a specific position on a pair of HA sequences, the binary value is "1 (named as mutation)" if the residue types of the two sequences on this position are different; conversely, its binary value is "0 (named as no mutation)". In general, an influenza vaccine should be updated if an antigenic distance is more than 4.0 between the current vaccine strain and the strains expected to circulate in next season [15]. The antigenic distance between strains A and B is the reciprocal of the normalized HI assay of B relative to antisera raised against A [55]. Among 343 pairs of HA sequences, 225 pairs with antigenic distance ≥ 4 are considered as "antigenic variants" and 118 pairs with antigenic distance < 4 are considered as "similar viruses". For example, the antigenic distance of the pair of HA sequences, A/England/42/72 and A/Port_Chalmers/1/73, is 12 and this pair is considered as "antigenic variant". Conversely, the antigenic distance of the

pair of HA sequences, A/Wuhan/359/95 and A/Nanchang/933/95, is 1 and this pair is considered as "similar virus".

HA sequences

The HA sequences of the H3N2 virus were download from influenza virus resource [36] on April 5, 2008. After removing the sequences whose nucleotide shorter than 981 or repeated strain name, the number of sequences was 5,959. 4,548 of them can be further partitioned into influenza seasons according to their date of isolation or Plotkin's study [34]. For the sequences in the same influenza season, identical strains from the same geographic area were removed. Then, the number of sequences became 2,789 (Table 4.2) that distributed in 36 seasons ranged from 1983 to 2008. The influenza season is defined as 1 October through 30 September for the year before 1999. For example, the "1982-1983 season" refers to those sequences collected between 1

The HA sequences of the H3N2 virus were download from influenza virus resource [36] on April 5, 2008. After removing the sequences whose nucleotide shorter than 981 or repeated strain name, the number of sequences was 5,959. 4,548 of them can be further partitioned into influenza seasons according to their date of isolation or Plotkin's study [34]. For the sequences in the same influenza season, identical strains from the same geographic area were removed. Then, the number of sequences became 2,789 (Table 4.2) that distributed in 36 seasons ranged from 1983 to 2008. The influenza season is defined as 1 October through 30 September for the year before 1999. For example, the "1982-1983 season" refers to those sequences collected between 1

相關文件