Remarks - 從最佳化觀點推導多評準分類規則

Chapter 5 Implementation 36

6.2 Remarks

Although the adventages mentioned above, there are some limitations of proposed method, which are the future works of this study.

The number of binary variables u^k,l_i and v^k,l_r is direct propotion to the number of obejcts n in such data set. While the number of objects become large, the computation time will increase, seriously. The numbers of binary variables λ^k,l_j and d^k,l_j,pare also direct propotion to the numbers of attributes m

and sub-attributes q, respectively, but this is a relatively minor problem since the numbers of attributes and sub-attributes are not very large in most cases.

So, how to discrease the number of u^k,l_i and v_r^k,l is a major issue for future works.

References

Aja-Fernandez, S., Luis-Garcia, R., Martin-Fernandez, M. A., and Alberola-Lopez, C. (2004), “A computational TW3 classiﬁer for skeletal maturity assessment — a computing with words approach”, Journal of Biomedical Informatics, 37, 99–107.

Beynon, M. J. and Buchanan, K. L. (2003), “An illustration of variable precision rough set theory: The gender classiﬁcation of the European Barn Swallow (Hirundo rustica)”, Bulletin of Mathematical Biology, 65, 835–

858.

Dunn, D. C., Thomas, W. E. G., and Hunter, J. O. (1980), “An evaluation of highly selective vagotomy in the treatment of chronic duodenal ulcer”, Surgery Gynecology Obstetrics, 150, 845–849.

Geurts, P., Fillt, M., Seny, D., Meuwis, A., Malaise, M., Merville, M.-P., and Wehenkel, L. (2005), “Proteomic mass spectra classiﬁcation using decision tree based ensemble methods”, Bioinformatics, 21, 3138–3145.

Goligher, J. C., Hill, G. L., Kenny, T. E., and Nutter, E. (1978), “Proximal gastric vagotomy without drainage for duodenal ulcer: results after 5-8 years”, British Journal Surgery, 65, 145–151.

Hvidsten, T. T., Lægreid, A., and Komorowski, J. (2003), “Learning rule-based models of biological process from gene expression time proﬁles using gene ontology”, Bioinformatics, 19, 1116–1123.

Kay, A. (1967), “Memorial lecture — an evaluation of gastric acid secretion tests”, Gastroenterology, 53, 834–852.

Li, H.-L. and Fu, C.-J. (2005), “A linear programming approach for identi-fying a consensus sequence on DNA sequences”, Bioinformatics, 21, 1838–

1845.

Li, R. and Wang, Z.-O. (2004), “Mining classiﬁcation rules using rough sets and neural networks”, European Journal of Operational Research, 157, 439–448.

Pawlak, Z. (1982), “Rough sets”, International Journal of Information and Computer Sciences, 11, 341–356.

Predki, B., Slowinski, R., Stefanowski, J., Susmaga, R., and Wilk, S.

(1998), “ROSE — software implementation of the rough set theory”, in L. Polkowski and A. Skowron (eds.), Rough Sets and Current Trends in Computing, volume 1424 of Lecture Notes in Computer Science, 605–608, Berlin: Springer.

Predki, B. and Wilk, Sz. (1999), “Rough set based data exploration us-ing ROSE system”, in Z. W. Ras and A. Skowron (eds.), Foundations of Intelligent Systems, volume 1609 of Lecture Notes in Computer Science, 172–180, Berlin: Springer.

Quinlan, J. R. (1986), “Induction of decision trees”, Machine Learning, 1, 81–106.

(1993), C4.5: Programs for Machine Learning, Los Altos, CA: Mor-gan Kaufmann.

Shen, Lixiang and Loh, Han Tong (2004), “Applying rough sets to market timing decisions”, Decision Support Systems, 37, 583–598.

Slowinski, K. (1992), “Rough classiﬁcation of HSV patients”, in R. Slowin-ski (ed.), Intelligent Decision Support — Handbook of applications and advances of the rough sets theory, 77–94, Dordrecht, Nethrelands: Kluwer.

Sun, M. and Xiong, M. (2003), “A mathematical programming approach for gene selection and tissue classiﬁcation”, Bioinformatics, 19, 1243–1251.

Tay, Francis E. H. and Shen, Lixiang (2002), “Economic and ﬁnancial predic-tion using rough sets model”, European Journal of Operapredic-tional Research, 141, 641–659.

Tsumoto, R. (1999), “Discovery of rules for medical expert systems — rough set approach”, in Proceedings of Third International Conference on Com-putational Intelligence and Multimedia Applications, 212–216, ICCIMA’99.

Zhang, H., Yu, C.-Y., Singer, B., and Xiong, M. (2001), “Recursive parti-tioning for tumor classiﬁcation with gene expression microarray data”, in Proceedings of the National Academy of Sciences of the USA, volume 98, 6730–6735.

Ziarko, W. (1993), “Variable precision rough set model”, Journal of Com-puter and System Sciences, 46, 39–59.

Appendices

A The HSV Patients Data Set

The data set as shown in Table A.1, is composed of 122 patients with duo-denal ulcer treated by HVS, described by 11 pre-operating attributes. At-tribute 1 – 4 concern anamnesis, and the remaining atAt-tributes are related to pre-operation gastric secretion examined with the histaminic test of Kay (1967). The patients are classiﬁed according to a long term result of HVS, evaluated by a surgeon in the modiﬁed Visick grading. The grading was derived from the following deﬁnition Goligher et al. (1978):

• Excellent: absolutely no symptoms, perfect result. The class index, 1, is given.

• Very good: patient considers result perfect, but interrogation elicits mild occasional symptoms easily controlled by a minor adjustment of diet. The class index, 2, is given.

• Satisfactory: mild or moderate symptoms easily controlled by care, which cause some discomfort, but patient and surgeon are satisﬁed with result which dose not interfere seriously with life or work. The class index, 3, is given.

• Unsatisfactory: moderate or sever symptoms of complications which interfere with work or normal life; patient or surgeon dissatisﬁed with result; includes all cases with recurrent ulcer and those submitted to further operation, even though the latter may have been followed by considerable symptomatic improvement. The class index, 4, is given.

Table A.1: The HSV patients data set with original values of attributes

Table A.1: (conti.)

B The European Barn Swallow Data Set

Table B.2: The European barn swallow data set with original values of at-tributes

Table B.2: (conti.)

No. a1 a2 a3 a4 a5 a6 a7 a8 class

58 30.3 126 44 125 130 130 17.9 4607 2

59 30 88 48 88 122 121 19.2 4962 1

60 30.7 117 44 119 127 126 19.6 5316 2

61 29.5 85 48 84 118 118 17.5 5080 1

62 30.8 108 47 108 126 127 18.9 5671 2

63 30.3 97 46 95 126 125 18.4 5316 1

64 30.6 97 48 97 128 127 18.1 5434 1

65 29.6 90 48 77 122 121 18.2 5198 1

66 30.2 112 46 113 129 130 19.6 5198 2

67 29.6 93 45 91 119 118 21 5080 1

68 29.7 86 49 86 125 124 17.6 5080 1

69 28.9 74 46 103 127 126 17.3 4726 2

C The Input Data File Format for MCOCR

The ﬁle as shown in Appendix D is a sample input data ﬁle for MCOCR. An input data ﬁle represents an input data set. There are three tags in an input data ﬁle.

• Classes: to tell MCOCR that classes’ deﬁnitions are beginning.

• Attributes: to tell MCOCR that attributes’ deﬁnitions are beginning.

• Objects: to tell MCOCR that objects’ details are beginning.

The order of tags ”Attributes,” ”Classes” and ”Objects” is arbitrary.

CLASS.

Format—

Classes [Class num.] [1’st class index] [2’nd class index] ... [n’th class index]

meaning:

• ﬁrst ﬁeld means the number of classes in the input data set

• the following ﬁelds means the class index of each class

Example—

Classes 4 c1 c2 c3 c4

The example means that there are four classes in the input data set. Their index are c1, c2, c3 and c4, respectively.

NOTE: The initial character of a class index must be ‘C’ or ‘c’, and followed

by a number. ¤

ATTRIBUTES.

Format—

Attributes

[1’st attribute index] [num. of 1’st attribute values]

[2’nd attribute index] [num. of 2’nd attribute values]

· · · ·

[n’th attribute index] [num. of n’th attribute values]

meaning:

• each row represents an attribute

• ﬁrst ﬁeld means the attribute index

• second ﬁeld means the number of attribute values of such attribute

Example—

The example means that there are four attributes in the input data set. Their index are a1, a2, a3 and a4, respectively.

• Attribute a1 has three possible attribute values, i.e., 1, 2, 3.

• Attribute a2 has two possible attribute values, i.e., 1, 2.

• Attribute a3 has four possible attribute values, i.e., 1, 2, 3, 4.

• Attribute a4 has three possible attribute values, i.e., 1, 2, 3.

NOTE: The initial character of an attribute index must be ‘A’ or ‘a’, and

followed by a number. ¤

OBJECTS.

Format—

Objects

• each row represents an object

• ﬁrst ﬁeld means the object index

• last ﬁeld means the class index of such object

• the rest of ﬁelds mean the attribute value of each attribute

Example—

The example means that there are ﬁve objects in the input data set. Their index are o1, o2, o3, o4, o5.

• Object o1 belongs to class c1, and each attribute values are 3, 2, 4, 3.

• Object o2 belongs to class c1, and each attribute values are 2, 2, 1, 2.

• Object o3 belongs to class c2, and each attribute values are 1, 1, 2, 1.

• Object o4 belongs to class c3, and each attribute values are 2, 2, 3, 3.

• Object o5 belongs to class c4, and each attribute values are 3, 1, 4, 2.

NOTE: The initial character of an object index must be ‘O’ or ‘o’, and

fol-lowed by a number. ¤

NOTE: The separator between two ﬁelds can only be SPACE.

D A Sample Input Data File for MCOCR

在文檔中從最佳化觀點推導多評準分類規則— 以生物及醫療資訊為例 (頁 52-0)