In this thesis, we proposed a nearest neighbor algorithm (IBL) and used sophisticated coding and weighting method in order to classify the data with symbolic domains. In direct comparisons on some famous data sets by different testing methodologies, our k-NN (MVDM) performed better than back propagation, ID3, KBANN, and so on.
In view of prototypes, we proposed a symbolic nearest mean classifier whose prototypes are learned by modifying the minimum distance classifier to solve the symbolic domains, attribute weighting, and learn a prototype to each class.
Furthermore, we consider all the contributions of prototypes to each class and design a fuzzy prototype to be the mean to each class. Both of algorithms can be improved by the weighting method. We provide comparisons with other algorithms by distinct prediction methodologies and show our implementations performed as well (or better than) C4.5, C5.0, PEBLS, and BAYES, etc. In addition, nearest neighbor offers clear advantages in that it is much faster to train and its representation relatively easy to interpret. No one yet knows how to interpret the networks of weights learned by neural nets. Decision trees are somewhat easier to interpret, but it is hard to predict the impact of a new example on the structure of the tree. Sometimes one new example makes no difference at all, and at other times it may radically change a large portion
nearest neighbor does not. In addition, classification time is fast (dependent only on the depth of the net or tree, not on the size of the input). Based on classification accuracy, though, it is not clear that other learning techniques have an advantage over nearest-neighbor methods.
With respect to nearest neighbor learning, we have shown how weighting exemplars can improve accuracy by information gain (IG) weight really a probability-weighted average of the informativity of the different values of the feature and can reduces the impact of unreliable examples. The nearest neighbor algorithm is one of the simplest learning methods known, and yet no other algorithm has been shown to outperform it consistently. Taken together, these results indicate that continued research on extending and improving nearest neighbor learning algorithms should prove fruitful.
References
[1] S. Salzberg, Learning with Nested Generalized Exemplars. Norwell, MA: Kluwer Academic Publishers, 1990.
[2] T.M. Cover and P.E. Hart “Nearest neighbor pattern classification,” IEEE Trans.
Inform. Theory, vol. 13, pp. 21–27, 1967.
[3] D. Aha, “Incremental, instance-based learning of independent and graded concept descriptions,” in Proc. of the Sixth International Workshop on Machine Learning, 1989. pp. 387–391.
[4] D. Aha and D. Kibler, “Noise-tolerant instance-based learning algorithms,” in Proc. 11th Int. Joint Conf. Artificial Intelligence, 1989. pp. 794–799.
[5] S. Salzberg, “Nested Hyper-rectangles for Exemplar-based Learning,” in K.P.
Jantke ed. Analogical and Inductive Inference: International Workshop AII, 1989, pp. 184–201.
[6] S. Cost and S. Salzberg, “Exemplar-based Learning to Predict Protein Folding,” in Proc. of the Symposium on Computer Applications to Medical Care, 1990.
[7] G. Towell, J. Shavlik, and M. Noordewier “Refinement of approximate domain theories by knowledge-based neural networks,” Proc. 8th National Conf.
Artificial Intelligence, 1990, pp. 861–866.
[8] S. Cost and S. Salzberg, “A weighted nearest neighbor algorithm for learning with symbolic features,” Machine Learning, vol. 10, pp. 57–78, 1993.
[9] C. Stanfill and D. Waltz, “Toward memory-based reasoning,” Communications of
[10] S. Salzberg, Learning with Nested Generalized Exemplars. Norwell, MA:
Kluwer Academic Publishers, 1990.
[11] P. Datta, D. F. Kibler, “Symbolic Nearest Mean Classifiers,” in Proc. AAAI, IAAI, 1997, pp. 82–87.
[12] P. Datta, D. F. Kibler, “Learning Symbolic Prototypes,” in Proc. ICML, 1997, pp.
75–82.
[13] R. Duda and P.Hart, Pattern classification and scene analysis. New York: John Wiley & Sons, 1973.
[14] D. Aha, D. Kibler, and M. Albert, “Instance-based learning algorithms,” Machine learning, vol. 6, pp. 37–66, 1991.
[15] J. Zhang, “Selecting typical instances in instance-based learning,” in Proc. 9th.
Int. Machine Learning Conf. 1992, pp. 470–479.
[16] D. Skalak, “Prototype and feature selection by sampling and random mutation hill climbing algorithms,” in Proc. 11th Int. Machine Learning Conf. 1994, pp.
293–301.
[17] P. Datta and D. Kibler (1995) “Learning Prototypical Concept Descriptions,” in Proc. 12th Int. Machine Learning Conf. 1995, pp. 158–166.
[18] C. Cardie,“Automating Feature Set Selection for Case-Based Learning of Linguistic Knowledge, " in Proceedings of the Conference on Empirical Methods in Natural Language Processing, 1996, pp. 113–126.
[19] R. J. Quinlan, “Induction of Decision Trees,” Machine Learning, vol. 1, pp.
81–106, 1986.
[20] J. R. Quinlan, C4.5:Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA, 1993.
[21] W. Daelemans and A. van den Bosch, “Generalization performance of backpropagation learning on a syllabification task,” in M. Drossaers and A.
Nijholt (Eds.), Proc. of the 3rd Twente Workshop on Language Technology.
1992, pp. 27–37.
[22]..J. Cendrowska, “PRISM: An algorithm for inducing modular rules,”
International Journal of Man-Machine Studies, vol. 27, pp. 349–370, 1987.
[23] H. I. Witten and A. B.MacDonald, “Using concept learning for knowledge acquisition,” International Journal of Man-Machine Studies, vol. 27, pp.
349–370, 1988.
[24] C. Harley and R. Reynolds, “Analysis of E. Coli Promoter Sequences,” Nucleic Acids Research, vol. 15, pp. 2343–2361, 1987.
[25] G. Towell, J. Shavlik and M. Noordewier, “Refinement of Approximate Domain Theories by Knowledge-Based Artificial Neural Networks,” in Proc. of the 8th National Conf. on Artificial Intelligence, 1990, pp. 861–866.
[26] C. M. O’Neill, “Escherichia coli promoters: Consensus as it relates to spacing class, specificity, repeat substructure, and three dimensional organization.,”
Journal of Biological Chemistry, no. 264, pp. 5522–5530, 1989.
[27] C. M. O’Neill and F. Chiafari, “Eserichia Coli promoters II. A spacing-class dependent promoter search protocol,” Journal of Biological Chemistry, no. 264, pp. 5531–5534, 1989.
[28] J. Ortega, “On the Informativeness of the DNA Promoter Sequences Domain Theory” (Research Note), vol. 2, pp. 361–367, 1995.
[29] K. D. Hawley and R. W. McClure, “Compilation and analysis of Escherichia Coli promoter DNA sequences,” Nucleic Acids Research, vol. 11, pp. 2237–2255,
[30] T. Record. Personal communication. 1989.
[31] S. Brunak, J. Engelbrecht, and S. Knudsen “Prediction of the human mRNA donor and acceptor sites from the DNA Sequence,” J.Mol.Biol., 220, pp. 49–65, 1991.
[32] M. O. Noordewier, G. G. Towell, and J. W. Shavlik, “Training Knowledge-Based Neural Networks to Recognize Genes in DNA Sequences,” Advances in Neural Information Processing Systems, vol. 3, 1991.
[33] G. G. Towell, J. W. Shavlik and M. W. Craven, “Constructive Induction in Knowledge-Based Neural Networks,” in Proc. of the 8th International Machine Learning Workshop, 1991, pp. 213–217.
[34] G. G. Towell, “Symbolic Knowledge and Neural Networks: Insertion, Refinement, and Extraction,” PhD Thesis, University of Wisconsin – Madison, 1991.
[35] G. G. Towell and J. W. Shavlik, 1992; “Interpretation of Artificial Neural Networks: Mapping Knowledge-based Neural Networks into Rules,” In Advances in Neural Information Processing Systems, vol. 4, 1992.
[36] D. J. Watson, H. H. Hopkins, W. J. Roberts, A. J. Steitz, and M. A. Weiner, The Molecular Biology of the Gene. Benjamin-Cummings, Menlo Park, CA, 1987.
[37] S. Kasif, S. Salzberg, D. L. Waltz, J. Rachlin, D. Aha, “A Probabilistic Framework for Memory-Based Reasoning,” Artificial Intelligence, 104(1-2), pp.
287–311, 1998.
[38] G. G. Towell, M. W. Craven and J. W. Shavlik “Constructive Induction in Knowledge-Based Neural Networks,” in Proc of the 8th International Machine Learning Workshop, 1991, pp. 213-217.
[39] “Training Knowledge-Based Neural Networks to Recognize Genes in DNA Sequences,” in Proc. of the conf. on Advances in neural information processing systems, 1990, pp. 530–536.
[40] O. M. Noordewier, G. G. Towell and W. J. Shavlik, “Training knowledge-based neural networks to recognize genes in DNA sequences,” In Advances in Neural Information Processing Systems, vol. 3, 1991.
[41] J. R. Quinlan,“Induction of Decision Trees,” Machine Learning, vol. 1, pp.
81–106, 1986.
[42] Shavlik, J. W., R. J. Mooney, and G. G. Towell, “Symbolic and Neural Learning Algorithms,” An Experimental Comparison. Machine Learning, vol. 6, pp.
111–143, 1991.
[43] Aguilar-Ruiz, J.S., Riquelme, J.C., Toro, M.,. “Evolutionary .Learning .of Hierarchical Decision Rules,” IEEE Systems, Man and Cibernetics, Part B, vol.
33, pp. 324 – 331, 2003.
[44] S. Rampone, “Recognition of Splice-Junctions on DNA Sequences by BRAIN learning algorithm,” Bioinformatics, vol. 14, pp. 676–684, 1998.
[45] P. Domingos and M. Pazzani, “Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier,” Machine Learning, vol. 29, pp.
103–130, 1997.