Results - 基於蛋白質自由能之預測B細胞表位方法

Chapter 2 Manuscript

2.3 Results

Tenfold cross-validation

I used stratified tenfold cross-validation tests. The data set was randomly divided into ten equal subsets such that the number of epitopes to non-epitopes was in a 1:1 ratio. Nine of the ten subsets were used for training the classifier, and the tenth subset was used for testing the classifier. This procedure was repeated ten times, with each subset used exactly once as the testing data. Results from five tenfold runs were averaged to produce a single value, which represents the estimated performance of classifier.

Performance of energy-related features in selected classifiers

In this study, 44 energy-related features were developed for continuous B-cell epitope prediction.

First, the energy-related features were tested on learning algorithms that have previously demonstrated prominent performance in the prediction of continuous B-cell epitopes, namely k-NN, SVM, and ANN. Performance of the classifiers trained with energy-related features is shown in Table 2-3. The maximum performance was achieved by the ANN containing a single hidden layer of 23 hidden units, which demonstrated 68.6%, 66.7%, and 70.5% in accuracy, specificity, and sensitivity, respectively. During this initial evaluation of the energy-related features, I tested a number of options on the learning algorithms, but the accuracy did not improve further (data not shown).

- 29 -

Table 2-3. Performance of k-NN, SVM, and ANN trained with the energy-related features.

Classifier Accuracy Specificity Sensitivity

k-NN 59.0 59.5 58.5

SVM 60.7 63.4 58.0

ANN 68.6 66.7 70.5

Relevance of energy-related features

To further assess the relevance of the three types of energy features that were designed, I trained and tested the classifiers with different subsets of energy-related features. As shown in Table 2-4, the extent to which classification performance was affected varied between k-NN, SVM, and ANN. However, FEdiff and FEss combined achieved accuracy similar to the accuracy obtained using all 44 energy-related features (FEavg, FEdiff, and FEss combined) in k-NN, SVM, and ANN.

In fact, in the case of the ANN classifier, the accuracy achieved using FE_diff and FE_ss exceeded that achieved using all the energy-related features combined. The combination of FEdiff and FEss

will be collectively referred to as FE_best for the rest of the study.

I compared the performance of 1D FEbest features relative to 3D FEbest features as a first step to analyze the sensitivity of the classifiers to window size. The result of classification, based on 1D or 3D FE_bestfeatures, is shown in Table 2-5. The accuracy achieved by the classifiers trained with 3D FEbest features was higher than that of the classifiers trained with 1D FEbest features. That the accuracy achieved with 3D FE_best features is comparable to the accuracy achieved using both 1D and 3D FEbest features suggests the 3D FEbest features play an important role in epitope prediction. To further test the sensitivity of the classifiers to 3D window size, I performed an ablation study by removing one 3D FEbest feature at a time, and analyzing the performance of

- 30 -

classifiers trained with the remaining features (Table 2-9 in supplementary material). The result shows that the performance of the classifiers was not specifically influenced by removal of any one of the 3D FE_bestfeatures. Rather, the predictive performance was optimal when the classifiers were trained with the entire set of 3D FEbest features. This may be due to that fact that B-cell epitope do not have a fixed length. Therefore, continuous B-cell epitope predictions based on any specific window size may not be able to detect epitopes with a diversity of lengths.

Table 2-4. Performance of k-NN, SVM, and ANN trained and tested with different subsets of energy-related features.

k-NN

Energy-related features Accuracy Specificity Sensitivity

FEavg 55.8 57.3 54.3

Energy-related features Accuracy Specificity Sensitivity

FEavg 48.2 42.3 54.1

ANN

Energy-related features Accuracy Specificity Sensitivity

FEavg 50.1 2.0 98.1

FEdiff 63.4 58.9 67.8

FE_ss 54.5 46.4 62.5

FE_avg + FE_diff 60.3 55.3 65.3

FEavg + FEss 54.1 47.4 60.7

FEdiff + FEss 72.6 72.9 72.3

FEavg + FEdiff + FEss 68.6 66.7 70.5

Table 2-5 - Performance of k-NN, SVM, and ANN trained with 1D or 3D FEbest features.

k-NN

Energy-related features Accuracy Specificity Sensitivity

1D FEbest 55.0 54.9 55.1

3D FEbest 74.3 79.6 68.9

SVM

Energy-related features Accuracy Specificity Sensitivity

1D FE_best 56.1 60.7 51.5

3D FEbest 66.1 76.7 55.5

ANN

Energy-related features Accuracy Specificity Sensitivity

1D FE_best 56.0 51.3 60.6

3D FE_best 80.0 83.9 76.0

- 32 -

Comparison with previous features

To demonstrate the significance of energy-related features, I compared them with other features that were previously used for prediction of continuous B-cell epitopes. I included 178 previously used features for comparison, as summarized in Table 2-2. They were derived from amino acid propensity scales, word probabilities, sequence complexity, and AAP antigenicity scales. In this comparative study, the classifiers were trained on the four different types of features, as well as the novel energy-related features. Among the previously developed features, k-NN, SVM, and ANN consistently demonstrated the best performance when trained with the AAP antigenicity scale, as shown in Table 2-6. The 3D FE_best features, and the AAP antigenicity scale, both achieved greater than 60% accuracy in k-NN, SVM, and ANN. When I combined the 3D FEbest

and AAP antigenicty scale features, k-NN, SVM, and ANN produced accuracy of 77.2%, 78.6%, and 81.4%, respectively. The AAP antigenicity scale, and the 3D FE_best features, can work together in the identification of continuous B-cell epitopes, and the combination of features may even lead to a complementary effect, as observed in the SVM classifier.

Table 2-6 – Performance of k-NN, SVM, and ANN trained with previously used features or the novel energy-related features.

k-NN

Feature Accuracy Specificity Sensitivity

Amino acid propensity scale 57.1 55.1 59.0

Word probability 55.3 54.4 56.1

Sequence complexity 52.0 52.5 51.4

AAP 61.0 60.3 61.6

3D FE_best 74.3 79.6 68.9

3D FEbest + AAP 77.2 80.5 73.8

- 33 -

SVM

Feature Accuracy Specificity Sensitivity

Amino acid propensity scale 62.9 64.6 61.1

Word probability 52.8 54.2 51.4

Sequence complexity 47.6 52.5 42.7

AAP 70.1 69.7 70.5

3D FEbest 66.1 76.7 55.5

3D FEbest + AAP 78.6 79.8 77.4

ANN

Feature Accuracy Specificity Sensitivity

Amino acid propensity scale 57.5 55.1 59.9

Word probability 50.9 29.4 72.3

In addition to comparing energy-related features to previously used features, I also compared the k-NN, SVM, and ANN trained with 3D FEbest features with current epitope predictors, namely ABCPred [11], BCPred [16] and the AAP method [15, 16]. Table 2-7 shows the performance of the classifiers trained with 3D FEbest features, and the performance of prediction servers. Upon submission of testing examples to the trained servers, scores ranged between 0 and 1.0 are returned, where a higher score value indicates higher probability of the peptide to be predicted as B-cell epitope. The threshold was set at 0.5 for ABCPred, as suggested in the publication by Saha and Raghava [11]. For BCPred and the AAP method, the related publications did not indicate the optimal threshold values. In this study, the best performance was observed for BCPred and the

- 34 -

AAP method when the threshold was set at 0.9. Since BCPred and the AAP method both returned scores close to or equal to 1.0, both servers showed a higher sensitivity in prediction than the other classifiers, but both suffered severely in low specificity. The overwhelming false positive rate is especially impractical for vaccine development as the advantage through computational prediction, mainly the reduction of time and cost, is diminished when the prediction returns too many candidate peptides for further experimental screening. The k-NN and ANN classifiers trained with 3D FEbest features outperformed the current prediction servers available for continuous B-cell epitope prediction in terms of specificity, as well as accuracy, whereas the SVM trained with 3D FEbest features demonstrated comparable performance with respect to the current prediction servers.

Table 2-7. Performance of k-NN, SVM, and ANN trained with 3D FEbest features, and performance of current epitope prediction servers ABCPred, BCPred and AAP method.

Method Accuracy Specificity Sensitivity

k-NN 74.3 79.6 68.9

SVM 66.1 76.7 55.5

ANN 80.0 83.9 76.0

ABCPred 52.3 67.5 37.0

BCPred 67.2 36.4 98.0

AAP method 65.1 30.1 100

Testing on an independent dataset

To evaluate the performance of classifiers trained with 3D FEbest features, I computed its predictive performance on an independent dataset that was retrieved from the AntiJen Database

- 35 -

[27]. To ensure that the testing dataset is independent of the training data set, protein epitopes were selected from the AntiJen database such that the epitopes do not overlap with those selected from the Bcipep database. The classifiers were trained with the data set retrieved from the Bcipep database, and tested on the data set retrieved from the AntiJen database. The k-NN, SVM, and ANN demonstrated 62.7, 61.0, and 67.5% in accuracy, respectively, as shown in Table 2-8. For the purpose of comparison, I also evaluated the performance of ABCPred, BCPred, and the AAP method. Similar to my method, ABCPred, BCPred method and the AAP method were trained on datasets obtained from the Bcipep database. The ABCPred server demonstrated an average accuracy of 47.2% using a threshold value at 0.5. The BCPred, and AAP method demonstrated 58.5%, and 56.5% in accuracy, respectively, using threshold values at 0.9.

In the classifiers trained with the 3D FE_best features, a possible reason for the observed drop in prediction performance may be due to the difference in epitope density in the two datasets. In the dataset retrieved from Bcipep, the 200 epitopes were distributed across 145 proteins, whereas in the dataset retrieved from the Antijen database, 85 epitopes were distributed across 45 proteins.

Since the construction of FEdiff features is based on comparing the free energy of structures mutated in the epitope region and structures mutated in non-epitope regions, the higher epitope density (in the dataset retrieved from the Antijen database) affects the relevance of FEdiff features.

In addition, the performance of the classifiers may be over-fitting the training dataset from the Bcipep database to some extent. That the published servers were trained with peptides from the Bcipep database, it is likely that these servers also over-fitted examples from the Bcipep database.

Nonetheless, the result indicates that classifiers trained with 3D FE_best features have the ability to identify peptides as potential B-cell epitopes with reasonably high accuracy.

- 36 -

Table 2-8 – Performance of current servers, and k-NN, SVM, and ANN trained with 3D FE_best features on a data set retrieved from the AntiJen database.

Method Accuracy Specificity Sensitivity

k-NN 62.7 63.7 61.6

SVM 61.0 61.4 60.5

ANN 67.5 68.6 66.4

ABCPred 47.2 65.9 28.5

BCPred 58.5 62.8 54.2

AAP method 56.5 65.6 47.4

- 37 -

在文檔中基於蛋白質自由能之預測B細胞表位方法 (頁 36-45)