Chapter 5 Implementation 36
6.2 Remarks
Although the adventages mentioned above, there are some limitations of proposed method, which are the future works of this study.
The number of binary variables uk,li and vk,lr is direct propotion to the number of obejcts n in such data set. While the number of objects become large, the computation time will increase, seriously. The numbers of binary variables λk,lj and dk,lj,pare also direct propotion to the numbers of attributes m
and sub-attributes q, respectively, but this is a relatively minor problem since the numbers of attributes and sub-attributes are not very large in most cases.
So, how to discrease the number of uk,li and vrk,l is a major issue for future works.
References
Aja-Fernandez, S., Luis-Garcia, R., Martin-Fernandez, M. A., and Alberola-Lopez, C. (2004), “A computational TW3 classifier for skeletal maturity assessment — a computing with words approach”, Journal of Biomedical Informatics, 37, 99–107.
Beynon, M. J. and Buchanan, K. L. (2003), “An illustration of variable precision rough set theory: The gender classification of the European Barn Swallow (Hirundo rustica)”, Bulletin of Mathematical Biology, 65, 835–
858.
Dunn, D. C., Thomas, W. E. G., and Hunter, J. O. (1980), “An evaluation of highly selective vagotomy in the treatment of chronic duodenal ulcer”, Surgery Gynecology Obstetrics, 150, 845–849.
Geurts, P., Fillt, M., Seny, D., Meuwis, A., Malaise, M., Merville, M.-P., and Wehenkel, L. (2005), “Proteomic mass spectra classification using decision tree based ensemble methods”, Bioinformatics, 21, 3138–3145.
Goligher, J. C., Hill, G. L., Kenny, T. E., and Nutter, E. (1978), “Proximal gastric vagotomy without drainage for duodenal ulcer: results after 5-8 years”, British Journal Surgery, 65, 145–151.
Hvidsten, T. T., Lægreid, A., and Komorowski, J. (2003), “Learning rule-based models of biological process from gene expression time profiles using gene ontology”, Bioinformatics, 19, 1116–1123.
Kay, A. (1967), “Memorial lecture — an evaluation of gastric acid secretion tests”, Gastroenterology, 53, 834–852.
Li, H.-L. and Fu, C.-J. (2005), “A linear programming approach for identi-fying a consensus sequence on DNA sequences”, Bioinformatics, 21, 1838–
1845.
Li, R. and Wang, Z.-O. (2004), “Mining classification rules using rough sets and neural networks”, European Journal of Operational Research, 157, 439–448.
Pawlak, Z. (1982), “Rough sets”, International Journal of Information and Computer Sciences, 11, 341–356.
Predki, B., Slowinski, R., Stefanowski, J., Susmaga, R., and Wilk, S.
(1998), “ROSE — software implementation of the rough set theory”, in L. Polkowski and A. Skowron (eds.), Rough Sets and Current Trends in Computing, volume 1424 of Lecture Notes in Computer Science, 605–608, Berlin: Springer.
Predki, B. and Wilk, Sz. (1999), “Rough set based data exploration us-ing ROSE system”, in Z. W. Ras and A. Skowron (eds.), Foundations of Intelligent Systems, volume 1609 of Lecture Notes in Computer Science, 172–180, Berlin: Springer.
Quinlan, J. R. (1986), “Induction of decision trees”, Machine Learning, 1, 81–106.
(1993), C4.5: Programs for Machine Learning, Los Altos, CA: Mor-gan Kaufmann.
Shen, Lixiang and Loh, Han Tong (2004), “Applying rough sets to market timing decisions”, Decision Support Systems, 37, 583–598.
Slowinski, K. (1992), “Rough classification of HSV patients”, in R. Slowin-ski (ed.), Intelligent Decision Support — Handbook of applications and advances of the rough sets theory, 77–94, Dordrecht, Nethrelands: Kluwer.
Sun, M. and Xiong, M. (2003), “A mathematical programming approach for gene selection and tissue classification”, Bioinformatics, 19, 1243–1251.
Tay, Francis E. H. and Shen, Lixiang (2002), “Economic and financial predic-tion using rough sets model”, European Journal of Operapredic-tional Research, 141, 641–659.
Tsumoto, R. (1999), “Discovery of rules for medical expert systems — rough set approach”, in Proceedings of Third International Conference on Com-putational Intelligence and Multimedia Applications, 212–216, ICCIMA’99.
Zhang, H., Yu, C.-Y., Singer, B., and Xiong, M. (2001), “Recursive parti-tioning for tumor classification with gene expression microarray data”, in Proceedings of the National Academy of Sciences of the USA, volume 98, 6730–6735.
Ziarko, W. (1993), “Variable precision rough set model”, Journal of Com-puter and System Sciences, 46, 39–59.
Appendices
A The HSV Patients Data Set
The data set as shown in Table A.1, is composed of 122 patients with duo-denal ulcer treated by HVS, described by 11 pre-operating attributes. At-tribute 1 – 4 concern anamnesis, and the remaining atAt-tributes are related to pre-operation gastric secretion examined with the histaminic test of Kay (1967). The patients are classified according to a long term result of HVS, evaluated by a surgeon in the modified Visick grading. The grading was derived from the following definition Goligher et al. (1978):
• Excellent: absolutely no symptoms, perfect result. The class index, 1, is given.
• Very good: patient considers result perfect, but interrogation elicits mild occasional symptoms easily controlled by a minor adjustment of diet. The class index, 2, is given.
• Satisfactory: mild or moderate symptoms easily controlled by care, which cause some discomfort, but patient and surgeon are satisfied with result which dose not interfere seriously with life or work. The class index, 3, is given.
• Unsatisfactory: moderate or sever symptoms of complications which interfere with work or normal life; patient or surgeon dissatisfied with result; includes all cases with recurrent ulcer and those submitted to further operation, even though the latter may have been followed by considerable symptomatic improvement. The class index, 4, is given.
Table A.1: The HSV patients data set with original values of attributes
Table A.1: (conti.)
B The European Barn Swallow Data Set
Table B.2: The European barn swallow data set with original values of at-tributes
Table B.2: (conti.)
No. a1 a2 a3 a4 a5 a6 a7 a8 class
58 30.3 126 44 125 130 130 17.9 4607 2
59 30 88 48 88 122 121 19.2 4962 1
60 30.7 117 44 119 127 126 19.6 5316 2
61 29.5 85 48 84 118 118 17.5 5080 1
62 30.8 108 47 108 126 127 18.9 5671 2
63 30.3 97 46 95 126 125 18.4 5316 1
64 30.6 97 48 97 128 127 18.1 5434 1
65 29.6 90 48 77 122 121 18.2 5198 1
66 30.2 112 46 113 129 130 19.6 5198 2
67 29.6 93 45 91 119 118 21 5080 1
68 29.7 86 49 86 125 124 17.6 5080 1
69 28.9 74 46 103 127 126 17.3 4726 2
C The Input Data File Format for MCOCR
The file as shown in Appendix D is a sample input data file for MCOCR. An input data file represents an input data set. There are three tags in an input data file.
• Classes: to tell MCOCR that classes’ definitions are beginning.
• Attributes: to tell MCOCR that attributes’ definitions are beginning.
• Objects: to tell MCOCR that objects’ details are beginning.
The order of tags ”Attributes,” ”Classes” and ”Objects” is arbitrary.
CLASS.
Format—
Classes [Class num.] [1’st class index] [2’nd class index] ... [n’th class index]
meaning:
• first field means the number of classes in the input data set
• the following fields means the class index of each class
Example—
Classes 4 c1 c2 c3 c4
The example means that there are four classes in the input data set. Their index are c1, c2, c3 and c4, respectively.
NOTE: The initial character of a class index must be ‘C’ or ‘c’, and followed
by a number. ¤
ATTRIBUTES.
Format—
Attributes
[1’st attribute index] [num. of 1’st attribute values]
[2’nd attribute index] [num. of 2’nd attribute values]
· · · ·
· · · ·
[n’th attribute index] [num. of n’th attribute values]
meaning:
• each row represents an attribute
• first field means the attribute index
• second field means the number of attribute values of such attribute
Example—
The example means that there are four attributes in the input data set. Their index are a1, a2, a3 and a4, respectively.
• Attribute a1 has three possible attribute values, i.e., 1, 2, 3.
• Attribute a2 has two possible attribute values, i.e., 1, 2.
• Attribute a3 has four possible attribute values, i.e., 1, 2, 3, 4.
• Attribute a4 has three possible attribute values, i.e., 1, 2, 3.
NOTE: The initial character of an attribute index must be ‘A’ or ‘a’, and
followed by a number. ¤
OBJECTS.
Format—
Objects
• each row represents an object
• first field means the object index
• last field means the class index of such object
• the rest of fields mean the attribute value of each attribute
Example—
The example means that there are five objects in the input data set. Their index are o1, o2, o3, o4, o5.
• Object o1 belongs to class c1, and each attribute values are 3, 2, 4, 3.
• Object o2 belongs to class c1, and each attribute values are 2, 2, 1, 2.
• Object o3 belongs to class c2, and each attribute values are 1, 1, 2, 1.
• Object o4 belongs to class c3, and each attribute values are 2, 2, 3, 3.
• Object o5 belongs to class c4, and each attribute values are 3, 1, 4, 2.
NOTE: The initial character of an object index must be ‘O’ or ‘o’, and
fol-lowed by a number. ¤
NOTE: The separator between two fields can only be SPACE.