Chapter 4. Simulation and Experiment
4.2 Simulation and Results
4.2.4 Comparison by Leave-One-Out Methodology
We use the leave-one-out strategy to test the promoter data set by our three algorithms (FSNM, SNM, k-NN (MVDM)) and provide comparisons to nearest neighbor (NN) using the overlap metric (which counts the number of feature value mismatches between two examples), PEBLS, and BAYES (Bayesian classifier) [37].
And the comparison is showed in the Table XV. From Table XV, All of our algorithms are better than the other algorithms.
TABLE XV
A
CCURACIES OFP
ROMOTERD
ATAS
ET OND
IFFERENTA
LGORITHMSB
YL
EAVE-O
NE-O
UTPromoter Rank BAYES [37] 91.50 4
PEBLS [37] 90.60 5
NN [37] 80.50 6
SNM 93.34 2
FSNM 91.51 3
5-NN(MVDM) 94.43 1
4.2.5 Comparison by Ten-Fold Cross-Validation Methodology
Besides, we achieve the results of Splice data set by using ten-fold cross-validation methodology on 1000 randomly selected from the complete set of 3190(we delete 15 missing values) and randomizing the permutation of data set to each fold so that we could choose the best result and we provide comparisons with other algorithms, such as KBANN [38]–[40], PEBLS, ID3 [41], and so on (all experiments, except BRAIN, carried out at the University of Wisconsin [35], [42]).
From Table XVI, the performance of k-NN (MVDM) is better than other algorithms but SNM and FSNM are only better than NN (overlap).
Moreover, we also test the Lenses data set by ten-fold cross-validation and randomizing the permutation of data set to each fold so that we could choose the best result to compare with well-known algorithms, such as C4.5 and C5.0 [43]. From Table XVIII, our experimental results show that our performances are superior to other algorithms.
TABLE XVI Nearest. Neighbor 82.72 11
Brain 95.67 2
4.2.6 Comparison by Another Methodology
Finally, we specify all experiments by the average of 30 runs of randomly choosing two-thirds of the data as a training set and the remainder as the test set to test Promoter using our three algorithms and also randomizing permutation of data set to each partition to choose the best accuracy. Here, we compare with PEBLS, C4.5, and SNM [11], [12]. From Table XVIII, we can see that the performance of k-NN (MVDM) is better than others.
TABLE XVIII
A
CCURACIES OFP
ROMOTERD
ATAS
ETS OND
IFFERENTA
LGORITHMSB
Y ANOTHER METHODOLOGYPromoter Rank
C4.5 74.30 6 SNM
(Kibler) 91.40 3
PEBLS 89.40 4
SNM 93.34 2
FSNM 89.26 5
3-NN
(MVDM) 96.61 1
4.3 Summary
In Sec. 4.2.1, we first list the information gain and gain ratio of the data set that we want to test. We can boost our accuracies by adding the information gain weighting method. In Sec. 4.2.2, we compare our three algorithms (k-NN (MVDM), SNM, and FSNM) with and without information gain weighting by leave-one-out strategy and find out that k-NN (MVDM) is better than the other two algorithms. In Sec. 4.2.3, we provide comparisons with the variance of difference ratio between SNM and FSNM find the variance of FSNM is larger than the variance of SNM greatly but it is surprising that FSNM’s performance approximates to the performance of SNM. In Sec. 4.2.4, we use leave-one-out methodology and provide comparisons with Kasif, Salzberg, Waltz, Rachlin, and Aha [37]. Our performances are all better than theirs. In Sec. 4.2.5, we compare with Rampone [44] by another prediction methodology called ten-fold cross-validation and find out that only the k-NN is better than others. Moreover, we compare with C4.5 and C5.0 and our performance are superior to other algorithms. Finally, in Sec. 4.2.6, ours is compared with Domingos and Pazzani [45] by the average of 30 runs of randomly choosing two-thirds of the data as a training set and the remainder as the test set and find that k-NN (MVDM) is still only the best in our experiments and SNM approximates to PEBLS and SNM [11], [12].
Chapter 5. Conclusion
In this thesis, we proposed a nearest neighbor algorithm (IBL) and used sophisticated coding and weighting method in order to classify the data with symbolic domains. In direct comparisons on some famous data sets by different testing methodologies, our k-NN (MVDM) performed better than back propagation, ID3, KBANN, and so on.
In view of prototypes, we proposed a symbolic nearest mean classifier whose prototypes are learned by modifying the minimum distance classifier to solve the symbolic domains, attribute weighting, and learn a prototype to each class.
Furthermore, we consider all the contributions of prototypes to each class and design a fuzzy prototype to be the mean to each class. Both of algorithms can be improved by the weighting method. We provide comparisons with other algorithms by distinct prediction methodologies and show our implementations performed as well (or better than) C4.5, C5.0, PEBLS, and BAYES, etc. In addition, nearest neighbor offers clear advantages in that it is much faster to train and its representation relatively easy to interpret. No one yet knows how to interpret the networks of weights learned by neural nets. Decision trees are somewhat easier to interpret, but it is hard to predict the impact of a new example on the structure of the tree. Sometimes one new example makes no difference at all, and at other times it may radically change a large portion
nearest neighbor does not. In addition, classification time is fast (dependent only on the depth of the net or tree, not on the size of the input). Based on classification accuracy, though, it is not clear that other learning techniques have an advantage over nearest-neighbor methods.
With respect to nearest neighbor learning, we have shown how weighting exemplars can improve accuracy by information gain (IG) weight really a probability-weighted average of the informativity of the different values of the feature and can reduces the impact of unreliable examples. The nearest neighbor algorithm is one of the simplest learning methods known, and yet no other algorithm has been shown to outperform it consistently. Taken together, these results indicate that continued research on extending and improving nearest neighbor learning algorithms should prove fruitful.
References
[1] S. Salzberg, Learning with Nested Generalized Exemplars. Norwell, MA: Kluwer Academic Publishers, 1990.
[2] T.M. Cover and P.E. Hart “Nearest neighbor pattern classification,” IEEE Trans.
Inform. Theory, vol. 13, pp. 21–27, 1967.
[3] D. Aha, “Incremental, instance-based learning of independent and graded concept descriptions,” in Proc. of the Sixth International Workshop on Machine Learning, 1989. pp. 387–391.
[4] D. Aha and D. Kibler, “Noise-tolerant instance-based learning algorithms,” in Proc. 11th Int. Joint Conf. Artificial Intelligence, 1989. pp. 794–799.
[5] S. Salzberg, “Nested Hyper-rectangles for Exemplar-based Learning,” in K.P.
Jantke ed. Analogical and Inductive Inference: International Workshop AII, 1989, pp. 184–201.
[6] S. Cost and S. Salzberg, “Exemplar-based Learning to Predict Protein Folding,” in Proc. of the Symposium on Computer Applications to Medical Care, 1990.
[7] G. Towell, J. Shavlik, and M. Noordewier “Refinement of approximate domain theories by knowledge-based neural networks,” Proc. 8th National Conf.
Artificial Intelligence, 1990, pp. 861–866.
[8] S. Cost and S. Salzberg, “A weighted nearest neighbor algorithm for learning with symbolic features,” Machine Learning, vol. 10, pp. 57–78, 1993.
[9] C. Stanfill and D. Waltz, “Toward memory-based reasoning,” Communications of
[10] S. Salzberg, Learning with Nested Generalized Exemplars. Norwell, MA:
Kluwer Academic Publishers, 1990.
[11] P. Datta, D. F. Kibler, “Symbolic Nearest Mean Classifiers,” in Proc. AAAI, IAAI, 1997, pp. 82–87.
[12] P. Datta, D. F. Kibler, “Learning Symbolic Prototypes,” in Proc. ICML, 1997, pp.
75–82.
[13] R. Duda and P.Hart, Pattern classification and scene analysis. New York: John Wiley & Sons, 1973.
[14] D. Aha, D. Kibler, and M. Albert, “Instance-based learning algorithms,” Machine learning, vol. 6, pp. 37–66, 1991.
[15] J. Zhang, “Selecting typical instances in instance-based learning,” in Proc. 9th.
Int. Machine Learning Conf. 1992, pp. 470–479.
[16] D. Skalak, “Prototype and feature selection by sampling and random mutation hill climbing algorithms,” in Proc. 11th Int. Machine Learning Conf. 1994, pp.
293–301.
[17] P. Datta and D. Kibler (1995) “Learning Prototypical Concept Descriptions,” in Proc. 12th Int. Machine Learning Conf. 1995, pp. 158–166.
[18] C. Cardie,“Automating Feature Set Selection for Case-Based Learning of Linguistic Knowledge, " in Proceedings of the Conference on Empirical Methods in Natural Language Processing, 1996, pp. 113–126.
[19] R. J. Quinlan, “Induction of Decision Trees,” Machine Learning, vol. 1, pp.
81–106, 1986.
[20] J. R. Quinlan, C4.5:Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA, 1993.
[21] W. Daelemans and A. van den Bosch, “Generalization performance of backpropagation learning on a syllabification task,” in M. Drossaers and A.
Nijholt (Eds.), Proc. of the 3rd Twente Workshop on Language Technology.
1992, pp. 27–37.
[22]..J. Cendrowska, “PRISM: An algorithm for inducing modular rules,”
International Journal of Man-Machine Studies, vol. 27, pp. 349–370, 1987.
[23] H. I. Witten and A. B.MacDonald, “Using concept learning for knowledge acquisition,” International Journal of Man-Machine Studies, vol. 27, pp.
349–370, 1988.
[24] C. Harley and R. Reynolds, “Analysis of E. Coli Promoter Sequences,” Nucleic Acids Research, vol. 15, pp. 2343–2361, 1987.
[25] G. Towell, J. Shavlik and M. Noordewier, “Refinement of Approximate Domain Theories by Knowledge-Based Artificial Neural Networks,” in Proc. of the 8th National Conf. on Artificial Intelligence, 1990, pp. 861–866.
[26] C. M. O’Neill, “Escherichia coli promoters: Consensus as it relates to spacing class, specificity, repeat substructure, and three dimensional organization.,”
Journal of Biological Chemistry, no. 264, pp. 5522–5530, 1989.
[27] C. M. O’Neill and F. Chiafari, “Eserichia Coli promoters II. A spacing-class dependent promoter search protocol,” Journal of Biological Chemistry, no. 264, pp. 5531–5534, 1989.
[28] J. Ortega, “On the Informativeness of the DNA Promoter Sequences Domain Theory” (Research Note), vol. 2, pp. 361–367, 1995.
[29] K. D. Hawley and R. W. McClure, “Compilation and analysis of Escherichia Coli promoter DNA sequences,” Nucleic Acids Research, vol. 11, pp. 2237–2255,
[30] T. Record. Personal communication. 1989.
[31] S. Brunak, J. Engelbrecht, and S. Knudsen “Prediction of the human mRNA donor and acceptor sites from the DNA Sequence,” J.Mol.Biol., 220, pp. 49–65, 1991.
[32] M. O. Noordewier, G. G. Towell, and J. W. Shavlik, “Training Knowledge-Based Neural Networks to Recognize Genes in DNA Sequences,” Advances in Neural Information Processing Systems, vol. 3, 1991.
[33] G. G. Towell, J. W. Shavlik and M. W. Craven, “Constructive Induction in Knowledge-Based Neural Networks,” in Proc. of the 8th International Machine Learning Workshop, 1991, pp. 213–217.
[34] G. G. Towell, “Symbolic Knowledge and Neural Networks: Insertion, Refinement, and Extraction,” PhD Thesis, University of Wisconsin – Madison, 1991.
[35] G. G. Towell and J. W. Shavlik, 1992; “Interpretation of Artificial Neural Networks: Mapping Knowledge-based Neural Networks into Rules,” In Advances in Neural Information Processing Systems, vol. 4, 1992.
[36] D. J. Watson, H. H. Hopkins, W. J. Roberts, A. J. Steitz, and M. A. Weiner, The Molecular Biology of the Gene. Benjamin-Cummings, Menlo Park, CA, 1987.
[37] S. Kasif, S. Salzberg, D. L. Waltz, J. Rachlin, D. Aha, “A Probabilistic Framework for Memory-Based Reasoning,” Artificial Intelligence, 104(1-2), pp.
287–311, 1998.
[38] G. G. Towell, M. W. Craven and J. W. Shavlik “Constructive Induction in Knowledge-Based Neural Networks,” in Proc of the 8th International Machine Learning Workshop, 1991, pp. 213-217.
[39] “Training Knowledge-Based Neural Networks to Recognize Genes in DNA Sequences,” in Proc. of the conf. on Advances in neural information processing systems, 1990, pp. 530–536.
[40] O. M. Noordewier, G. G. Towell and W. J. Shavlik, “Training knowledge-based neural networks to recognize genes in DNA sequences,” In Advances in Neural Information Processing Systems, vol. 3, 1991.
[41] J. R. Quinlan,“Induction of Decision Trees,” Machine Learning, vol. 1, pp.
81–106, 1986.
[42] Shavlik, J. W., R. J. Mooney, and G. G. Towell, “Symbolic and Neural Learning Algorithms,” An Experimental Comparison. Machine Learning, vol. 6, pp.
111–143, 1991.
[43] Aguilar-Ruiz, J.S., Riquelme, J.C., Toro, M.,. “Evolutionary .Learning .of Hierarchical Decision Rules,” IEEE Systems, Man and Cibernetics, Part B, vol.
33, pp. 324 – 331, 2003.
[44] S. Rampone, “Recognition of Splice-Junctions on DNA Sequences by BRAIN learning algorithm,” Bioinformatics, vol. 14, pp. 676–684, 1998.
[45] P. Domingos and M. Pazzani, “Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier,” Machine Learning, vol. 29, pp.
103–130, 1997.