• 沒有找到結果。

Chapter 2 Manuscript

2.4 Discussion

One of the challenges for developing reliable continuous B-cell epitope predictors is how to deal with epitope variability. While a number of approaches for continuous B-cell epitope prediction has been developed based on physicochemical properties of antigenic proteins, these studies were based on a single sequence composition of the antigen, from which the physicochemical properties were derived. Consequently, the effect of antibodies generated through reversed immunological approach is questionable since considerable variability exists in the epitopes. In this study, I explored the total free energy associated with 20L possible conformations resulting from a single site mutation in a protein structure with sequence length L. To the best of my knowledge, energy of the free antigenic structure has not been used in the context of B-cell epitope identification tasks. Based on free energy, I identified point mutations that are more likely to occur among the 20L point mutations. While mutations are randomly introduced to DNA, certain amino acid substitutions are rarely observed in nature because these point mutations can cause serious collisions in the amino acid side chains, which lead to thermodynamically unstable protein structures. Therefore, analysis of total free energy provided a way to eliminate mutations that are thermodynamically unstable, hence unlikely to be observed in nature. In total, I constructed 44 energy-related features, which can be grouped into three classes - FEavg, FEdiff, and FEss. I evaluated the performance of k-NN, SVM, and ANN trained with the novel energy-related features, and found that the FEdiff and FEss features, collectively referred to as FEbest features, are particularly relevant to the prediction of continuous B-cell epitopes. When I further evaluated the sensitivity of the classifiers to window size, the results indicate that the classifiers trained with 3D FEbest features outperform those trained with 1D FEbest features. In addition, the performance of 3D FEbest features was compared to that of previously used features

- 38 -

in continuous B-cell epitope prediction. The 3D FEbest features demonstrated comparable performance to the AAP antigenicity scale, and superior performance over the rest of the previously used features. Furthermore, k-NN, SVM, and ANN trained with 3D FEbest features outperformed current continuous B-cell epitope prediction servers, namely ABCPred, BCPred, and the AAP method.

Besides testing the energy-related features on k-NN, SVM, and ANN, I also tested the features on hierarchical learning algorithms, such as the C4.5 decision tree. However, the energy-related features are less effective in the prediction of continuous B-cell epitopes when they are considered within hierarchical classifiers (results shown in supplementary material, Table 2-10). In fact, when I analyzed the information gain of energy-related features, none of the features demonstrated superior relevance to the prediction of continuous B-cell epitopes. The energy-related features achieved enhanced predictive performance in classification systems that consider the overall set of input features, as observed with k-NN, SVM, and ANN.

As with most current machine learning methods, the development of energy-related features required specification of peptide length. For instance, Chen et al. computed the AAP antigenicity scale for peptides that were 20 amino acids in length [15]. In addition, the ABCPred classifier developed by Saha and Raghava was trained with input peptides of fixed length, and the reported optimal performance was achieved with a data set consisting of peptides that were 16 amino acids in length [11]. In this study, two distance measures were considered; the 1D distance defined by amino acid length, and the 3D distance defined by Angstroms. The 1D distance measure intended to identify the center of a continuous B-cell epitope, whereas the 3D distance measure intended to identify the core of a three dimensional space, to which an antibody may interact with. The 1D and 3D features were combined to see if they could work together to determine the likelihood of a particular peptide to be identified as a B-cell epitope. Since the length of epitopes typically

- 39 -

range from 3 to 30 amino acids in length, I defined the 1D window from 6 to 30 amino acids with increments of 4 amino acids in between. The 3D window was defined as 3, 5, or 10 Angstroms.

The results showed that the 3D features work better than 1D features in the identification of continuous B-cell epitopes. Furthermore, the results demonstrate that the performance of the classifiers was dependent on the combined information collected from variable window sizes, rather than from any specific window size. One of the reasons for this outcome is that B-cell epitopes do not have any fixed length, thus using a window of fixed length for prediction can misrepresent the peptide being analyzed, and reduce the overall prediction accuracy. The drawback of determining epitopes based on the information collected from variable window sizes is that the determined epitopes do not have correct boundaries. For the purpose of vaccine development, the overall improvement in accuracy outweigh the drawbacks as peptides of slightly different lengths may still be able to elicit the production of anti-peptide antibodies, which would in turn attack the native antigenic protein. Results of the study also indicated that the neighbors of B-cell epitopes also carry useful information that can help the classifier to identify epitopes. This is especially interesting given that the neighborhood region has been shown to play an important role in other continuous B-cell epitope prediction studies, such as the study by Sollner et al. [13], in which the authors showed that the neighborhood regions exhibit certain patterns in primary sequence. Furthermore, experimental studies have shown that mutating the neighborhood region of a B-cell epitope may alter the structure of the epitope, thus affect binding of the antibody [36]. Results of this study showed that free energy based on inducing point mutations in the epitopes and in the neighborhood regions adjacent to epitopes both contribute to the prediction of continuous B-cell epitopes.

- 40 -

相關文件