Future Works - 以認知學習修正XCS建構具知識教育與機械學習之雙模式學習機制

Chapter 6. Conclusions

6.2 Future Works

Future work will be addressed three issues. First, this work still roughly derives cognitive learning from Cognition Psychology. As for the entire theory of cognition, lots of faultless Psychology models even Cognition Psychology models have been flooded. The better efficiency of learning mechanism by computing simulation has the possibility to been come true. Second, although this work is the first one to the aspect, we still expect more and more AI researchers would enhance their model considering this kind of philosophy thinking. Besides, from the pass to the future, the other following models with better accuracy, and the performance might be the substitution for XCS. Third, the model considering more complex factors to finance prediction issue would be declared. Actually, the ponderable model is important to apply the right factors to the right issue to obtain the remarkable outcome.

Reference

1. J. McCarthy. “Generality in Artificial Intelligence”, Communications of the ACM, 30(12), pp.1030-1035, Dec. 1987.

2. S. Ghirlanda, M. Enquist. “Artificial Neural Networks as Models of Stimulus Control”, Animal Behaviour, 56, pp.1383–1389, 1998.

3. S. Ghirlanda, M. Enquist. “The Geometry of Stimulus Control”, Animal Behaviour, 58(4), pp.

695-706, 1999.

4. V. Chiew. “A Software Engineering Cognitive Knowledge Discovery Framework”, Proceedings of the First IEEE International Conference on Cognitive Informatics (ICCI’02), pp.163-172, Calgary, AB, Canada, Aug. 2002.

5. S. W. Wilson. “Classifier Fitness Based on Accuracy”, Evolutionary Computation, 3(2), pp.149-175, 1995.

6. J. Piaget. Structuralism (C. Maschler, Trans., original French edition published 1968). New York: Harper & Row, 1970.

7. J. H. Holland, J. S. Reitman. “Cognitive Systems Based on Adaptive Algorithms”, Pattern Directed Inference Systems, 7(2), pp.125-149, 1978.

8. J. H. Holland. “Processing and Processors for Schemata”, Associative Information Processing, New York, pp. American Elsevier, 127-146, 1971.

9. S. W. Wilson. “Classifier Systems and the Animat Problem”, Machine Learning, 2(3), pp.199-228, 1987.

10. L. B. Booker. “Triggered Rule Discovery in Classifier Systems”, Proceedings of the 3^rd International Conference on Genetic Algorithms (ICGA89), Morgan Kaufmann Publishers Inc, pp. 265-274, June 1989.

11. S. W. Wilson. “ZCS: A Zeroth Level Classifier System”, Evolutionary Computation, 2(1), pp.1-18, 1994.

12. P. W. Frey, D. J. Slate. “Letter Recognition Using Holland-Style Adaptive Classifiers”, Machine Learning, 6, pp.161-182, 1991.

13. S. W. Wilson. “Generalization in the XCS Classifier System”, Proceedings of the Third Genetic Programming Annual Conference. Morgan Kaufmann: San Francisco, CA, 665-674, 1998.

14. W. Stolzmann. “An Introduction to Anticipatory Classifier Systems”, Learning Classifier Systems: From Foundations to Applications, Lecture Notes in Artificial Intelligence, 1813, Springer, pp.175–194, 2000.

15. P. L. Lanzi, et al. Learning Classifier Systems: From Foundations to Applications, Lecture Notes in Artificial Intelligence, 1813, Springer, Berlin, 2000.

16. J. H. Holmes. Evolution-assisted Discovery of Sentinel Features in Epidemiologic Surveillance.

Ph.D. thesis, Drexel University, Philadelphia, PA, 1996.

17. J. H. Holmes. “Learning Classifier Systems Applied to Knowledge Discovery in Clinical Research Databases”, Learning Classifier Systems: From Foundations to Applications, Lecture Notes in Artificial Intelligence, 1813, pp.243-264, 2000.

18. J. H. Holmes, P. L. Lanzi. “Wolfgang Stolzmann, and Stewart W. Wilson. Learning Classifier Systems: New models, Successful Applications”, Information Processing Letters, 82, pp.23-30, 2002.

19. S. Kemp. Cognitive Psychology in the Middle Ages, Westport, Connecticut and London:

Greenwood Press, 1996.

20. I. Kant. “The Critique of Pure Reason”, Great books of Western World, 42. R.M.Hutchins (ed.), pp. ix-209, Chicago: Encyclopaedia Britannica,1952.

21. K. S. Lashley. Brain Mechanism and Intelligence: A Quantitative Study of Injuries to the Brain, Chicago: Chicago University Press, 1929.

22. J. M. Hunt. “Psychological Development: Early Experience”, Annual Review of Psychology, 30, pp. 103-143, 1979.

23. D. O. Hebb. “A Neuropsychological Theory”, Psychology: A Study of a Science, 1, New York:

McGraw-Hill, 1959.

24. M. W. Watson, K. W. Fische. “Development of Social Roles in Elicited and Spontaneous Behavior during the Preschool Years”, Developmental Psychology, 16, pp. 483-494, 1980.

25. B. F. Skinner. The Behavior of Organisms: An Experimental Analysis. New York:

Appleton-Century, Published in 1938 originally. Reprinted by the B. F. Skinner Foundation, 1991 and 1999.

26. J. Piaget. The Equilibration of Cognitive Structures: The Central Problem of Intellectual Development. Chicago: University of Chicago Press, 1985. Original work published 1975.

27. G. Luger, G. F. Luger. Artificial Intelligence, Structures and Strategies for Complex Problem Solving, 4^th Edition, Harlow, England: Addison-Wesley, pp. 471, 2002.

28. J. H. Holland. “Escaping Brittleness: The Possibilities of General-Purpose Learning Algorithms Applied to Parallel Rule-Based Systems”, Machine Learning, An Artificial Intelligence Approach, 2(ch. 20), pp. 593-623, Morgan Kaufmann, 1986.

29. J. H. Holland, J. S. Reitman. “Cognitive Systems Based on Adaptive Algorithms”, Pattern Directed Inference Systems, New York: Academic Press, pp. 313-329, 1978.

30. J. McCarthy. “Generality in Artificial Intelligence”, Communications of the ACM, 30(12), pp.1030-1035, Dec.1987.

31. J. McCarthy, P. J. Hayes. “Some Philosophical Problems from the Standpoint of Artificial Intelligence”, Machine Intelligence, 4, pp. 463-502. Edinburgh University Press, 1969.

32. J. H. Holland. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence, 2^nd Edition, MIT Press, 1992.

33. E. L. Thorndike. “Animal Intelligence: An Experimental Study of the Associative Processes in Animals”, Psychological Review, Monograph Supplement, 8, New York: Macmillan, 1898.

34. E. L. Thorndike. Elements of Psychology. New York: A. G. Seiler, 1905.

35. J. B. Watson. Behavior: An Introduction to Comparative Psychology, New York: Henry Holt

& Co, 1914.

36. M. V. Butz, S. W. Wilson. “An algorithmic description of XCS”, Soft Computing - A Fusion of Foundations, Methodologies and Applications, Springer-Verlag GmbH, 6(3-4), pp.144-153, June 2002.

37. J. Piaget. The Mechanisms of Perception, New York: Basic Books, 1969.

38. R. H. Bruning, et al. Cognitive Psychology and Instruction. Englewood Cliffs, NJ: Prentice Hall, 1995.

39. D. E. Rumelhart. “Schemata: The Building Blocks of Cognition”, In R. Spiro, B. Bruce, W.

Brewer, Theoretical Issues in Reading Comprehension, pp. 33-58, Hillsdale, NJ: Lawrence Erlbaum, 1980.

40. M. Anthony, et al. “On Exact Specification by Examples”, Proceedings of the Fifth Annual Workshop on Computational Learning Theory, 311-318, ACM Press, July 1992.

41. R. Freivalds, et al. “On the Power of Inductive Inference from Good Examples”, Theorertical Computer Science, 110(1), pp. 131-144, 1993.

42. S. A. Goldman, M. J. Kearns. “On the Complexity of Teaching”, Proceedings of the Fourth Annual Workshop on Computational Learning Theory, pp.303-314. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, August 1991.

43. J. Jackson, A. Tomkins. “A Computational Model of Teaching”, Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 319-326. ACM Press, July 1992.

44. A. Shinohara, S. Miyano. “Teachability in Computational Learning”, New Generation Computing, 8(4), pp. 337-347, 1991.

45. J. M. Belmont, E. C. Butterfield. “Learning Strategies as Determinants of Memory Deficiencies”, Cognitive Psychology, 2, pp. 411-420, 1971.

46. T. Buzan, Use Your Head, (Millennium Ed), London, BBC, 2000.

47. R. M. Gagne. The Conditions of Learning and Theory of Instruction, (4th Ed.), New York, NY:

Holt, Rinehart and Winston, 1985.

48. R. C. Atkinson, R. M. Shiffrin. “Human Memory: A Proposed System and Its Control Processes”, The Psychology of Learning and Motivation: Advances in Research and Theory, 2, pp. 89-195, New York: Academic Press, 1968.

49. N. Waugh, D. A. Norman. “Primary Memory”, Psychological Review, 72, pp. 89-104, 1965.

50. R. M. Gagne, K. L. Medsker. The Conditions of Learning: Training Applications, 1996.

51. G. A. Miller. “The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information”, Psychological Review, 63, pp.81-97, 1956.

52. H. A. Simon. “The Structure of Ill Structured Problems”, Artificial Intelligence, 4, pp. 181-201, 1973.

53. A. Newell, H. A. Simon. Human Problem Solving, Englewood Cliffs, NJ: Prentice-Hall, Inc., 1972.

54. J. R. Anderson. The Architecture of Cognition, Cambridge, MA: Harvard University Press, 1983.

55. A. Newell. Unified Theories of Cognition, Cambridge, MA: Harvard University Press, 1990.

56. S. K. Reed. Cognition Theory and Applications, 2^nd Edition, Brooks/Cole Publishing Company, Pacific Grove, California, 1988.

57. R. L. Solso. Cognitive Psychology. 5^th Edition, pp. 534, 1998.

58. G. Luger. Cognitive Science: The Science of Intelligent Systems. San Diego: Academic Press, 1994.

59. K. Chan, et al. “Intraday Volatility in the Stock Index and Stock Index Futures Markets”, Review of Financial Studies, 4(4), pp. 657-684, 1991.

60. K. Chan, et al. “Overnight Information and Intraday Trading Behavior: Evidence from NYSE Cross-listed Stocks and Their Local Market Information”, Journal of Multinational Financial Management, 10(3-4), pp. 495-509, 2000.

61. S. H. Chen. “Lecture 7: Rescale Range Analysis and the Hurst Exponent”, Financial Economics (I), Department of Economics, National Chengehi University, Taiwan, 2000.

62. M. J. Hinich, D. M. Patterson. “Evidence of nonlinearity in Daily Stock Returns”, Journal of Business & Economic Statistics, 3(1), pp. 69-77, 1985.

63. A. P. Chen, et al. “Applying Two-Stage XCS Model on Global Overnight Effect for Local Stock Prediction” Proceeding of (KES’2005) the 9th International Conference on Knowledge-Based & Intelligent Information & Engineering Systems, Melbourne, Australia, 2005.

Appendix A. Relevant XCS Statements

In this appendix, all the following statements are reference from Wilson‘s XCS. The detailed descriptions about XCS should be looked it up in [36].

z A Classifier in XCS

XCS keeps a population of classifiers which represent its knowledge about the problem.

Each classifier is a condition-action-prediction rule having the following parts:

- The condition C∈{0, 1, #}^L specifies the input states (sensory situations) in which the classifier can be applied (matches).

- The action A∈{a1,..., an,} specifies the action (possibly a classification) that the classifier proposes.

- The prediction p estimates (keeps an average of) the payoff expected if the classifier matches and its action is taken by the system.

Moreover, each classifier keeps certain additional parameters:

- The prediction error ε estimates the errors made in the predictions.

- The fitness f denotes the classifier's fitness.

- The experience exp counts the number of times since its creation that the classifier has belonged to an action set.

- The time stamp ts denotes the time-step of the last occurrence of a GA in an action set to which this classifier belonged.

- The action set size as estimates the average size of the action sets this classifier has belonged to.

- The numerosity num reflects the number of micro-classifiers (ordinary classifiers) this classifier which is technically called a macroclassifier represents.

z The Different Sets

There are four different sets that need to be considered in XCS.

- The population [P] consists of all classifiers that exist in XCS at any time t.

- The match set [M] is formed out of the current [P]. It includes all classifiers that match the current situation σ(t).

- The action set [A] is formed out of the current [M]. It includes all classifiers of [M] that propose the executed action.

- The previous action set [A]-1 is the action set that was active in the last execution cycle.

z Learning Parameters in XCS

In order to control the learning process in XCS the following parameters are used:

- N specifies the maximum size of the population (in micro-classifiers, i.e., N is the sum of the classifier numerosities).

- β is the learning rate for p, ε, f, and as.

- α, ε0, and υ are used in calculating the fitness of a classifier.

- γ is the discount factor used in multi-step problems in updating classifier predictions.

- θGA is the GA threshold. The GA is applied in a set when the average time since the last GA in the set is greater than θGA.

- χ is the probability of applying crossover in the GA.

- μ specifies the probability of mutating an allele in the offspring.

- θdel is the deletion threshold. If the experience of a classifier is greater than θdel, its fitness may be considered in its probability of deletion.

- δ specifies the fraction of the mean fitness in [P] below which the fitness of a classifier may be considered in its probability of deletion.

- θsub is the subsumption threshold. The experience of a classifier must begreater than 0,0 in order to be able to subsume another classifier.

- P# is the probability of using a # in one attribute in C when covering.

- pI, εj, and fI are used as initial values in new classifiers.

- pexplr, specifies the probability during action selection of choosing the action uniform randomly.

- θmna specifies the minimal number of actions that must be present in a match set [M], or else covering will occur.

- doGASubsumption is a Boolean parameter that specifies if offspring are to be tested for possible logical subsumption by parents.

- doActionSetSubsumption is a Boolean parameter that specifies if action sets are to be tested for subsuming classifiers.

z An Algorithmic Description of XCS

This section presents the algorithms used in XCS. When XCS is started, the modules must first of all be initialized. The parameters in the environment must be set. After the initialization, the main loop is called. RUN EXPERIMENT is the main loop. Besides, GENERATE MATCH SET, DOES MATCH, GENERATE COVERING CLASSIFIER, GENERATE PREDICTION ARRAY, SELECT ACTION, GENERATE ACTION SET, UPDATE SET, UPDATE FITNESS are the detailed sub-functions, shown as following.

RUN EXPERIMENT ( ):

1 ρ-1 Å0 2 do {

3 σ Å env: get situation

4 GENERATE MATCH SET [M] out of [P] using σ 5 GENERATE PREDICTION ARRAY PA out of [M]

6 act Å SELECT ACTION according to PA

7 GENERATE ACTION SET [A] out of [M] according to act 8 env: execute action act

17 RUN GA in [A] considering v inserting and possibly deleting in [P]

18 empty [A] -1

19 else

20 [A] -1Å [A]

21 ρ-1 Å ρ 22 σ-1 Å σ

23 } while (termination criteria are not met)

GENERATE MATCH SET ([P], σ):

1 initialize empty set [M]

2 while ([M] is empty)

3 for each classifier cl in [P]

4 if (DOES MATCH classifier cl in situation σ) 5 add classifier cl to set [M]

6 if (the number of different actions in [M] < θmna)

7 GENERATE COVERING CLASSIFIER clc, considering [M] and σ 8 add classifier clc to set [P]

9 DELETE FROM POPULATION [P]

10 empty [M]

11 return [M]

DOES MATCH (cl, σ):

1 for each attribute x in Ccl

2 if(x <> # and x <> the corresponding attribute in σ) 3 return false

4 return true

GENERATE COVERING CLASSIFIER ([M], σ):

1 initialize classifier cl

2 initialize condition Ccl with the length of σ 3 for each attribute x in Ccl

1 initialize prediction array PA to all null 2 initialize fitness sum array FSA to all 0.0 3 for each classifier cl in [M]

9 for each possible action A 10 if (FSA[A] is not zero)

11 PA[A] Å PA[A] / FSA[A]

12 return PA

SELECT ACTION (PA):

1 if (RandomNumber[0, 1) < pexplr) 2 //Do pure exploration here

3 return a randomly chosen action from those not null in PA 4 else

5 //Do pure exploitation here 6 return the best action in PA

GENERATE ACTION SET ([M], act):

1 initialize empty set [A]

2 for each classifier cl in [M] 8 //update prediction error εcl

9 if (expcl < 1 / β)

10 εcl Åεcl + (|P - pcl| - εcl) / expcl

11 else

12 εcl Åεcl +β * (|P - pcl| - εcl) 13 //update action set size estimate ascl

14 if (expcl < 1 / β)

15 ascl Å ascl + (

∑

C∈ ][ A numc - ascl) / expcl

16 else

17 ascl Å ascl +β * (

∑

C∈ ][ A numc - ascl) 18 UPDATE FITNESS in set [A]

19 if (doActionSctSubsumption)

20 DO ACTION SET SUBSUMPTION in [A] updating [P]

UPDATE FITNESS ([A]):

1 accuracySum Å 0

2 initialize accuracy vector k 3 for each classifier cl in [A]

4 if (εcl < ε0) 5 k(cl) Å 1 6 else

7 k(cl) Å α * (εcl / ε0)^-^υ

8 accuracySum Å accuracySum + k(cl) * numcl

9 for each classifier cl in [A]

10 fcl Å fcl + β * (k(cl) * numcl / accuracySum - fcl)

Appendix B. Knowledge Population

Table: Knowledge Population

Knowledge Condition part Action part

1 111100000011000110000 001

2 011100000001010110100 101

3 111100000111000111010 011

4 111100000011000111100 011

5 011100000111000110000 010

6 000101001000111110011 100

7 000111001000111011100 011

8 100011100000111011111 111

9 000011100000111011111 001

10 000011100100111011001 011

11 110011101100110011000 110

12 000011100100111011000 111

13 000011100110111010111 000

14 100011100000111010010 011

15 000011111000111010010 001

16 000011100000111010001 011

17 100011100000111011101 010

18 010011101000111011100 110

19 000011111000111010100 000

20 000011100111001010001 011

21 000011100000111011000 011

22 110011101111001011000 101

23 110011101111001011111 110

24 010011101000010011100 000

25 100011100100110011000 100

26 111000001011011011000 100

27 111100000111000110000 100

28 011100101001111110110 110

29 001101100000111110110 100

在文檔中以認知學習修正XCS建構具知識教育與機械學習之雙模式學習機制—以財務資料預測之知識學習為例 (頁 76-87)