行政院國家科學委員會專題研究計畫成果報告

(1)

模糊測度 Choquet 積分應用於教育測驗分析之研究(I) 研究成果報告(精簡版)

計畫類別：個別型

計畫編號： NSC 97-2410-H-468-014-

執行期間： 97 年 08 月 01 日至 98 年 07 月 31 日執行單位：亞洲大學生物科技學系

計畫主持人：劉湘川共同主持人：郭伯臣

計畫參與人員：學士級-專任助理人員：林莞惠

報告附件：出席國際會議研究心得報告及發表論文

處理方式：本計畫涉及專利或其他智慧財產權，2 年後可公開查詢

中華民國 98 年 10 月 30 日

(2)

成果報告(精簡版)

（2008/08/01~2009/07/31）

計畫類別： ■個別型計畫 □整合型計畫計畫編號： NSC 97-2413-H-468-014-

執行期間： 97 年 08 月 01 日至 98 年 07 月 31 日計畫主持人：劉湘川亞洲大學生物資訊學系暨心理學系共同主持人：郭伯臣國立臺中教育大學教育測驗統計研究所

計畫參與人員：

專任助理人員:

林莞惠靜宜大學應用數學系統計資訊組兼任助理人員:

劉育隆亞洲大學資訊工程學系博士班

杜雨潔國立台中教育大學教育測驗統計研究所博士班林士勛國立台中教育大學教育測驗統計研究所

成果報告類型（依經費核定清單規定繳交）：精簡報告

報告附件：出席國際學術會議心得報告及發表論文各一份

處理方式：本研究計畫涉及專利或其他智慧財產權，兩年後可公開查詢

執行單位：亞洲大學

(3)

率為其機率密度之可加測度，這樣的規定最大的好處在於計算簡便，但「機率之可加性運算」

顯然必須滿足；「不同機率密度間無交互作用之基本假設」，然而在諸多實際應用上並不能完全適用，因而有其他不同之非可加性測度應運而生，例如可能性測度(Possibility measure)、似真性測度(Plausibility measure) 、信任性測度(belief measure)、必然性測度(Necessity measure)等，

事實上，上述四種非可加性測度及眾所周知之機率測度都是單調性測度之特例，由於單調性測度之單調性條件甚多，同時確定並不容易，有其不明確性，故菅野道夫(Sugeno,)於 1974 年在提出λ測度之同時，首先將單調性測度稱為模糊測度，繼而，依循菅野道夫之說法，王正元與喬治‧克里爾(Zhenyuan Wang & George J. Klir) 於 1992 年出版第一本有關單調性測度之書籍，

將其命名為「模糊測度理論(Fuzzy Measure Theory)」，該模糊測度理論是古典測度理論之推廣理論，然而單調性測度發展之初，只討論明確數而非模糊數，事實上不宜稱為模糊測度，特別是目前單調性測度之發展，已由明確數擴張至模糊數了，則關於模糊數之單調性測度可稱為模糊化單調性測度(Fuzzified monotone measure)，若將單調性測度仍稱為模糊測度，則關於模糊數之模糊測度就有模糊不清之議，加之非單調性測度也已被學界引進，故而王正元與喬治‧克里爾於2009 年出版之模糊測度理論擴充版已更名為「廣義測度理論(Generalized Measure

Theory)」，由於目前學界所熟悉且容易溝通之名詞，仍稱之為模糊測度，故本研究計畫之單調性測度亦稱為模糊測度，另外單調性測度必須配合單調性積分才能竟其工，換言之、魯貝格積分必須相應擴張為單調性積分。此外，對應於單調性測度稱為模糊測度，則單調性積分也常被稱為模糊積分。第一個提出改進魯貝格積分之單調性積分者，應是義大利數學家魏塔利 (Giusseppe Vitali, 1925,1997), 他於 1925 以義大利文發表，延遲至 1997 年才被翻譯成英文方為人知，其後被法國數學家薛奎爾(Gustave Choquet)重新發現於 1954 年再度提出，經二十餘年之澎勃發展，學界已習慣稱之為Choquet 積分，故本研究計畫亦稱為 Choquet 積分。雖然菅野道夫(Sugeno,)於 1974 年也提出新的模糊積分，稱為 Sugeno 積分，與其提出之λ測度廣為工程、

管理等學界應用，但Sugeno 積分既不及 Choquet 積分之靈敏，且非魯貝格積分之推廣積分方法，故暫時未列入本研究計畫之內容。Sugeno(1974)將模糊測度分為次可加測度，可加測度、

及超可加測度，三種，主要因為其所提出之λ測度隨λ值而異，只有該三種可能，而目前學界也以為模糊測度只有該三種分類，本人於2006 年首先指出，實務所須，單調性混合模糊測度是不可忽略的，另指出Sugeno 之λ測度與 Zadeh(1978)之可能性測度均為單值模糊測度，適用性有限，有必要發展具有上述四種類別之多值模糊測度，並於2007 年起，陸續提出系列具上述改良性值之多值模糊測度族。除可提供工程、管理等學界應用外，並希望能兼顧理論與應用之進一步發展，且能轉化應用於教育測驗領域。

貳、研究目的與方法

本研究計畫為三年期研究計畫之第一年計畫；「模糊測度Choquet 積分應用於教育測驗分析之研究(I)」本年度計畫主要在探討「應用作者所提供之多種新模糊測度 Choquet 積分法來建立

(4)

1. 作者提出 L 測度，並探究其基本數理性質

2. 作者提出改進之完備 L 測度，並探究其基本數理性質 3. 作者提出新測度δ測度，並探究其基本數理性質

4. 作者提出基於 L 測度與δ測度之組合多值模糊測度，並探究其基本數理性質

5. 作者等提出分組資料之 C 測度，並探究其基本性質

6. 作者提出γ模糊密度(γ支撐)，並驗證其優於 C 支撐與 V 支撐 二、電腦分析系統程式設計

1. 基於γ支撐之 L 測度 Choquet 積分迴歸預測模式之電腦分析系統程式設計 2. 基於γ支撐之完備 L 測度 Choquet 積分迴歸預測模式之電腦分析系統程式

設計

3. 基於γ支撐之δ測度 Choquet 積分迴歸預測模式之電腦分析系統程式設計 4. 基於γ支撐之 L(δ)組合多值模糊測度 Choquet 積分迴歸預測模式之電腦分

析系統程式設計

5. 基於γ支撐之 C 測度 Choquet 積分迴歸預測模式之電腦分析系統程式設計 三、兩組教育測驗評量預測實證資料

1.苗栗某中學 60 位學生以其國中數學、理化，生物，及地球科學之畢業成績預測其國中基本能力測驗之自然科成績

2. 臺中某國民小學 128 位學生之上臂三頭肌、上臂二頭肌、肩胛下、腸棘上等四處的皮脂厚度推估出來的體脂肪率預測體脂肪計體脂肪率。

四、以預測量均方誤差為比較準則，進行上述資料之各種預測模式之 k 折交互驗證法 ( K-Folds Cross Validation Method) 比較研究

預測模式列示於下：

1. 複迴歸預測模式 2. 脊迴歸預測模式

(5)

5. 基於γ支撐之 L 測度 Choquet 積分迴歸預測模式 6. 基於γ支撐之完備 L 測度 Choquet 積分迴歸預測模式 7. 基於γ支撐之δ測度 Choquet 積分迴歸預測模式

8. 基於γ支撐之 L(δ)組合多值模糊測度 Choquet 積分迴歸預測模式 9. 基於γ支撐之 C 測度 Choquet 積分迴歸預測模式

叁、研究成果與發表論文目錄

一、. 提出 L 測度之重要數理性質及其 Choquet 積分迴歸預測模式 (一) 提出 L 測度之重要數理性質如下：

1. L 測度滿足有界性能與單調性是以模糊測度

2. L 測度是決定係數 L 在定義域

[

^{0 ,}^∞

)

^{上之連續遞增函數}

3. L 測度為多值模糊測度，^L ^∈

[

^{0 ,}^∞

)

^{，不同之決定係數}^{L 值決定了不同之模}

糊測度，換言之，L 測度有無限多模糊測度解，且其公式解具封閉型式 4. L=0 時，L 測度恰好為 Zadeh 之 P 測度

5. L 測度可為混合模糊測度、次可加模糊測度，及超可加模糊測度

(二)完成基於γ支撐之 L 測度 Choquet 積分迴歸預測模式之電腦分析系統程式設計 (三) 發表EI 級論文兩篇如下驗證了 L 測度 Choquet 積分迴歸預測模式優於複迴歸預測

模式、脊迴歸預測模式、基於γ支撐之Zadeh P 測度 Choquet 積分迴歸預測模式、

及基於γ支撐之Sugeno 之λ測度 Choquet 積分迴歸預測模式。

( 見附漸次出席國際會議發表論文及心得報告)

1. .Hsiang-Chuan Liu, Yu-Chieh Tu, Wen-Chih Lin, and Chin-Chun Chen (2008).

Choquet integral regression model based on L-Measure and γ-Support. Proceedings of

2008 International Conference on Wavelet Analysis and Pattern Recognition. (Hong

Kong, 30-31, Aug. 2008.) Volume: 2, pp.777-782. ISBN: 978-1-4244-2238-8. (EI

paper)

2. Hsiang-Chuan Liu, Yu-Du Jheng, Guey-Shya Chen and Bai-Cheng Jeng. (2008)

(6)

978-1-4244-2238-8. INSPEC Accession Number: 10299006. (EI paper)

二、. 提出完備 L 測度之重要數理性質及其 Choquet 積分迴歸預測模式 (一) 提出完備 L 測度之重要數理性質如下：

1. 在既定之模糊密度條件下，提出最大模糊測度；B 測度，及完備測度定義，並指出Sugeno 之λ測度、Zadeh 之 P 測度、及 L 測度均未包含 B 測度，換言之，

λ測度、 P 測度、及 L 測度均非完備測度。

2. 完備 L 測度滿足有界性能與單調性是以模糊測度

3. 完備 L 測度是決定係數 L 在定義域

[

^{0 ,}^∞

)

4. 完備 L 測度為多值模糊測度，^L ^∈

[

^{0 ,}^∞

)

^{，不同之決定係數} ^{L 值決定了不同}

之模糊測度，換言之，L 測度有無限多模糊測度解，且其公式解具封閉型式 5. 完備 L=0 時，完備 L 測度恰好為最小測度；Zadeh 之 P 測度

6. L → ∞ 時，完備L 測度恰好為最大測度；B 測度

7. 完備 L 測度可為混合模糊測度、次可加模糊測度，及超可加模糊測度

(二)完成基於γ支撐之完備 L 測度 Choquet 積分迴歸預測模式之電腦分析系統程式設計 (三) 發表EI 級論文同時被刊登於專書如下，驗證了完備 L 測度 Choquet 積分迴歸預測

模式優於複迴歸預測模式、脊迴歸預測模式、Zadeh P 測度 Choquet 積分迴歸預測模式、Sugeno 之λ測度 Choquet 積分迴歸預測模式、及 L 測度 Choquet 積分迴歸預測模式。

1. Hsiang-Chuan Liu, “A theoretical approach to the completed L-fuzzy

measure”, Conference Proceedings of 2009 International Institute of Applied Statistics Studies (2009IIASS), July 24-28 2009.Qindao, China, pp. 1121-1124, 2009.

ISBN:978-0-9806057-4-7. (EI paper)

2. Hsiang-Chuan Liu (2009). “A theoretical approach to the completed L-fuzzy measure”,

Quantitative Analysis Techology and Related Engineering Applications, pp. 1121-1124,

2009, AUSSINO ACADEMIC PUBLISH HOUSE Sydney Australia, ZHU Koulai &

Henry ZHANG, ISBN:978-0-9806057-4-7.

(7)

1. δ測度滿足有界性能與單調性是以模糊測度

2. δ測度是決定係數δ在定義域

[

⁻^{1, 1}

]

3. δ測度為多值模糊測度，^δ ^{∈ −}

[

^{1, 1}

]

，不同之決定係數δ值決定了不同之模糊測度，換言之，δ測度有無限多模糊測度解，且其公式解具封閉型式 4. δ=-1 時，δ測度恰好為 Zadeh 之 P 測度

5. δ=0 時，δ測度恰好為可加測度，當模糊密度之和為 1 時，Sugeno 之λ測度即可加測度，此時δ測度恰好亦為λ測度，並指出L 測度及完備 L 測度均未包含可加測度。

6. − ≤1 δ < 0 時，δ測度為次可加測度，0 < δ ≤ 1 時，δ測度為超可加測度 7. δ測度不可能為混合模糊測度及完備測度。

(二)完成基於γ支撐之δ測度 Choquet 積分迴歸預測模式之電腦分析系統程式設計 (三) 發表EI 級期刊論文如下，驗證了δ測度 Choquet 積分迴歸預測模式優於複迴歸預

測模式、脊迴歸預測模式、Zadeh P 測度 Choquet 積分迴歸預測模式、

及Sugeno 之λ測度 Choquet 積分迴歸預測模式。

Hsiang-Chuan Liu, Der-Bang Wu, Yu-Du Jheng and Tian-Wei Sheu (2009).

“Theory of Multivalent Delta-Fuzzy Measures and its Application”, WSEAS

TRANSACTION ON INTERNATIONAL SCIENCE AND APPLICATION ,Vol. 6, No.

6 1061-1070, June 2009. ISSN: 1790-0832. (EI Journal)

四、. 提出基於 L 測度與δ測度之組合多值模糊測度之重要數理性質及其 Choquet 積分迴歸 預測模式

(一) 提出基於 L 測度與δ測度之組合多值模糊測度；L(δ)測度之重要數理性質如下：

1. L(δ)測度滿足有界性能與單調性是以模糊測度

2. L(δ)測度是決定係數 L 在定義域

[

⁻^1, ^∞

)

3. L(δ)測度為多值模糊測度，^L ^{∈ −}

[

^1, ^∞

)

，不同之決定係數 L 值決定了不同之模糊測度，換言之，L(δ)測度有無限多模糊測度解，且其公式解具封閉型

(8)

糊測度解

4. L=-1 時，L(δ)測度恰好為 Zadeh 之 P 測度

5. L=0 時，L(δ)測度恰好為可加測度，當模糊密度之和為 1 時，Sugeno 之λ測度即可加測度，此時L(δ)測度恰好亦為λ測度。

6. − ≤1 L < 0 時，δ測度為次可加測度，0 < L < ∞ 時，L(δ)測度為超可加測度

7. L(δ)測度不可能為混合模糊測度及完備測度。

(二)完成基於γ支撐之 L(δ)測度 Choquet 積分迴歸預測模式之電腦分析系統程式設計 (三) 發表EI 級期刊論文如下，驗證了 L(δ)測度 Choquet 積分迴歸預測模式優於複迴

歸預測模式、脊迴歸預測模式、Zadeh P 測度 Choquet 積分迴歸預測模式、Sugeno 之λ測度Choquet 積分迴歸預測模式、L 測度 Choquet 積分迴歸預測模式

及δ測度Choquet 積分迴歸預測模式。

Hsiang-Chuan Liu, Chin-Chun Chen, Der-Bang Wu, and Tian-Wei Sheu (2009).

“Theory and Application of the Composed Fuzzy Measure of L-Measure and Delta-Measures”, WSEAS TRANSACTION ON INTERNATIONAL SCIENCE AND CONTRAL , Issue 8. Vol. 4, pp. 359-368, Augest 2009.

ISSN: 1991-8763.

(EI Journal)

五、. 提出基於 C 測度與δ測度之組合多值模糊測度之重要數理性質及其 Choquet 積分迴歸 預測模式

(一) 提出基於 C 測度之重要性質如下：

1. 基於複雜度之 C 測度滿足有界性能與單調性是以模糊測度

2. C 測度適合於分組資料之模糊測度度即可加測度，此時 L(δ)測度恰好亦為λ測度。

(二)完成了 C 測度 Choquet 積分預測模式之電腦分析系統程式設計

(三) 發表 SCI 級期刊論文如下，驗證了 C 測度 Choquet 積分預測模式優於複迴歸預測模式、脊迴歸預測模式、Zadeh P 測度 Choquet 積分迴歸預測模式、Sugeno 之λ測度

(9)

六、提出基於 Pearson 相關係數之γ模糊密度(γ支撐)，並驗證其優於 C 支撐與 V 支撐發 表論文( 同一之論文) 如下

1. .Hsiang-Chuan Liu, Yu-Chieh Tu, Wen-Chih Lin, and Chin-Chun Chen (2008).

Choquet integral regression model based on L-Measure and γ-Support. Proceedings of

2008 International Conference on Wavelet Analysis and Pattern Recognition. (Hong

Kong, 30-31, Aug. 2008.) Volume: 2, pp.777-782. ISBN: 978-1-4244-2238-8. (EI

paper)

2. Hsiang-Chuan Liu, Yu-Du Jheng, Guey-Shya Chen and Bai-Cheng Jeng. (2008) Choquet Integral Logistic Regression Algorithms Based on L-Measure and γ-Support.

Proceedings of 2008 International Conference on Wavelet Analysis and Pattern Recognition. (Hong Kong, 30-31, Aug. 2008.) .Volume: 2, pp.771-776. ISBN:

978-1-4244-2238-8. INSPEC Accession Number: 10299006. (EI paper)

四、結論

本研究計畫第一年度經數理分析之探討，提出一種有效之模糊密度；γ支撐，四種改善之多

值模糊測度及一種分組資料可用之模糊測度，同時完成了基於γ支撐之上述各種模糊測度之

Choquet 積分迴歸模式，包含複迴歸預測模式及脊迴歸預測模式之電腦分析系統程式設計、進行

兩組教育測驗資料之五折交互驗證比較研究，各種模糊測度之 Choquet 積分迴歸模式，均獲得 有效之成果，並發表了 1 篇 SCI 期刊論文，2 篇 EI 期刊論文，及 3 篇 EI 研討會論文。

五、附錄：發表論文

(10)

Applying a complexity-based Choquet integral to evaluate students’ performance

Jiunn-I Shieh^a,*, Hsin-Hung Wu^b, Hsiang-Chuan Liu^c

aDepartment of Information Science and Applications, Asia University, No. 500 Lioufeng Road, Wufeng Shiang, Taichung 413, Taiwan

bDepartment of Business Administration, National Changhua University of Education, Taiwan

cInstitute of Bioinformatics, Asia University, Taiwan

a r t i c l e i n f o

Keywords:

Fuzzy measure Discrete Choquet integral Entropy

Complexity

a b s t r a c t

The weighted arithmetic mean and the regression methods are the most often used operators to aggregate criteria in decision making problems with the assumption that there are no interactions among criteria. When interactions among criteria exist, the discrete Choquet integral is proved to be an adequate aggregation operator by further taking into accounts the interactions. In this study, we propose a complexity-based method to construct fuzzy measures needed by the discrete Choquet integral and a real data set is analyzed. The advantage of the complexity-based method is that no population probability is to be estimated such that the error of estimating the population probability is reduced. Four methods, including weighted arithmetic method, regression-based method, the discrete Choquet integral with the entropy-based method, and our proposed discrete Choquet integral with the complexity-based method, are used in this study to evaluate the students’ performance based on a Basic Competence Test. The results show that the students’ overall performance evaluated by our proposed discrete Choquet integral with the complexity-based method is the best among the four methods when the interactions among criteria exist.

1. Introduction

The most often used operator to aggregate criteria in decision making problems is the classical weighted arithmetic mean (Fish- burn, 1970). In many practical applications the decision criteria present some interaction. However, the problem of modeling such an interaction remains a difficult question, which is often over- looked (Domingo & Torra, 2002). The reason is that practitioners are lack of suitable tools to deal with the interactions such that criteria are assumed to be independent and exhaustive. This comes primarily from the absence of a precise definition of interactions as well as the complexity and difficulty of identifying the interaction phenomena among criteria. It is known that the mutual inde- pendence among the criteria is a necessary condition for aggregation operator to be additive. That is, if some criteria are preferentially dependent with the others, then no additive aggregation operator can model the preferences of the decision maker (Domingo & Torra, 2002).

The weighted arithmetic mean and regression method are un- able to overcome the undesirable phenomenon of dependence. In contrast, the Choquet integral takes into account the interactions among criteria. In addition, there is a key issue unsolved in the application of fuzzy integral with the determination of density

values to decide the fuzzy measures in the fusion process. In this study, entropy-based method and our proposed complexity-based method to construct the fuzzy measures in the discrete Choquet integral are discussed.

This paper is outlined as follows: Section2reviews weighting methods, fuzzy measures, and discrete Choquet integrals with two different constructs in fuzzy measures. A procedure of using Choquet integral is provided in Section3. A case study of applying the weighted arithmetic mean method, regression method, Cho- quet integral with the entropy-based method, and our proposed Choquet integral with the complexity-based method is performed in Section4to analyze the students’ overall performance on Basic Competence Test when the interactions exist. Finally, conclusions are summarized in Section5.

2. Weighting methods, fuzzy measures, and discrete Choquet integral

The classical weighted arithmetic mean method is the most commonly used operator to aggregate criteria in decision making problems without further considering the interactions among criteria. The regression method is to maximize the linear relation among the criteria without further taking into considering the interactions among criteria. On the contrary, the discrete Choquet integral is proved to be an adequate aggregation operator that j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / e s w a

(11)

byMurofushi and Sugeno (1989, 1991).

Choquet integral is deﬁned to integrate functions with respect to the fuzzy measures (Murofushi & Sugeno, 1989). Fuzzy integrals are very useful for global evaluation models but the number of parameters of fuzzy measures is large. The deﬁnitions of fuzzy measures and Choquet integrals are as follows (Murofushi & Su- geno, 1989):

Definition 1. Let N be a finite set of criteria. A discrete fuzzy measure on N is a set function v: 2^N?[0, 1] which satisfies the following axioms:

(i) v(/) = 0, v(N) = 1 (boundary conditions);

(ii) A # B implies vðAÞ 6 vðBÞ (monotonicity) for A, Be2N.

For each subset of criteria S # N, v(S) can be interpreted as the weight of the coalition S.

Deﬁnition 2. Let v be a fuzzy measure on N = {1, 2, . . ., n}. The discrete Choquet integral of function x: N ? R with respect to v is deﬁned by CvðxÞ ¼Pn

i¼1xðiÞ½vðAðiÞÞ vðAðiþ1ÞÞ, where ðÞ indicates a permutation on N such that xð1Þ6xð2Þ6 6 xðnÞ. Also A_ðiÞ¼ fðiÞ; . . . ; ðnÞg, and Aðnþ1Þ¼ /. For instance, if x16x36x2, then rank x1, x2, x3 from the smallest one to the largest one.

The result is x(1)= x1, x(2)= x3, x(3)= x2. Finally, Cvðx1;x2;x3Þ ¼ x1

½vðfð1Þ; ð2Þ; ð3ÞgÞ þ ðx3 x1Þ ½vfð2Þ; ð3Þg þ ðx2 x3Þ ½mðfð3ÞgÞ.

The discrete Choquet integral takes into account the interaction by means of the fuzzy measure v. If the criteria are independent, the fuzzy measure is additive. Then, the discrete Choquet integral coincide with the weighted arithmetic mean method. That is, CvðxÞ ¼Pn

i¼1vðfigÞ xi, xeRⁿ. For example, there are ﬁve students and three courses (D₁, D₂, and D₃). Assume the raw data and a fuzzy measuremon each subset are inTables 1 and 2, respectively. In Table 2, (0, 0, 0), (1, 0, 0), (0, 1, 0), (1, 1, 0), (0, 0, 1), (1, 0, 1), (0, 1, 1), and (1, 1, 1) represent empty set, {D1}, {D2}, {D1, D2}, {D3}, {D1, D3}, {D2, D3}, and {D1, D2, D3}, respectively. For the ﬁrst student, the raw scores are 70, 81, and 75. First, rank the scores from the smallest to the largest, i.e., 70, 75, and 81. Then, the overall performance

Choquet integral are 77.6667, 74.6667, 81.0002, and 77.8334, respectively.

To evaluate a discrete Choquet integral, we need a fuzzy measure ﬁrst. How to ﬁnd a suitable fuzzy measure becomes an issue.

To be a fuzzy measure, the measure needs to satisfy the axioms of the fuzzy measure. We note that entropy measure and complexity- based measure are qualiﬁed to be fuzzy measures. The former one is proposed byKojadinovic (2004)and the latter one is proposed in our study.

To measure the uncertainty of a random variable, the concept of entropy was introduced (Shannon, 1948). The basic idea is that an item with large entropy in its ratings is more important in a user’s interest than an item with small entropy. Based on this idea, an entropy-based method is in the following (Yu, Wen, Xu, & Ester, 2001): Given a discrete random variable A, let pÂbe the probability of A, then define entropy of A to be h(A) = PpÂlog₂pÂ, where pÂ> 0. With the similar formula, let B be a discrete random vector which contains at least two discrete random variables, then gener- alize this idea to a random vector and call p^Bbe the joint probability and h(B) the joint entropy. By using the idea of joint entropy to calculate the entropy of the subsets of criteria of N, define the fuzzy measurem1as the following:m1ðSÞ ¼_hðNÞ^hðSÞfor all S # N (Kojadinovic, 2004). By using the idea of entropy, we need to decide the number of level to be used to classify the raw data into the level of the score for each criterion. For example, let the number of level to be used be 2 and S contain only two random variables X1and X2. In addi- tional, assume the raw data are inTable 3.

The raw data inTable 3can be classiﬁed intoTable 4by histogram equalization of ‘‘hist.m” program of Matlab 7.0 for each random variable. To generate the complete information of fuzzy measurem1, ﬁrst to compute h(N). A joint pattern (1, 2) means that X₁= 1 and X₂= 2, and (2, 2) means that X₁= 2 and X₂= 2. There are 3 of pattern (1, 2) and 2 of pattern (2, 2). Thus, p^S(X1= 1, X2= 2) = 3/

5 = 0.6, and p^S(X1= 2, X2= 2) = 2/5 = 0.4. Therefore, hðNÞ ¼

0:6 log₂ð0:6Þ 0:4 log₂ð0:4Þ ¼ 0:9710. Next, h(S) is computed when S = X1and X2, i.e., h(X1) and h(X2). In this case, there are 3 pattern ‘‘1” and 2 pattern ‘‘2” in X1. FromTable 4, p^X¹ðX1¼ 1Þ ¼ 3=5 ¼ 0:6, p^X¹ðX1¼ 2Þ ¼ 2=5 ¼ 0:4, and hðX1Þ ¼ 0:6 log₂ð0:6Þ 0:4

log₂ð0:4Þ ¼ 0:9710. In contrast to X1, there are 5 pattern ‘‘2” in X2. From Table 4, p^X²ðX2¼ 1Þ ¼ 0=5 ¼ 0, p^X²ðX2¼ 2Þ ¼ 5=5 ¼ 1,

Table 2

A fuzzy measure used to demonstrate computation of the overall performance by Choquet integral

D1 D2 D3 Fuzzy measurem

0 0 0 0

1 0 0 0.1667

0 1 0 0.1667

1 1 0 0.5

Table 1

Example of the raw data used to demonstrate computation of the overall performance by Choquet integral

Student D1 D2 D3

1 70 81 75

2 70 85 86

3 65 85 84

4 75 91 85

5 75 80 82

Table 3

Example of the raw data used to construct fuzzy measures based on entropy and complexity methods

Student X1 X2

1 70 81

2 70 85

3 65 85

4 75 91

5 75 80

Table 4

The level of the score for each criterion classiﬁed from the raw data in Table 3 when the number of level is two

Student X1 X2

1 1 2

(12)

od is also easy to compute the fuzzy measure of a random vector with more than two discrete random variables. However, the entropy-based weighting scheme might take the risk to estimate the probability for each criterion. If the sample size is small, it often makes a larger error to estimate the population probability. Under such circumstances, we propose a complexity method to improve the prediction.

The basic concept of complexity is that the more substructures in a system, the more complex the system. This concept is in agree- ment with our intuitive understanding that it is the connectedness of the system elements that matters more. Thus, the more con- nected the system, the higher the number of substructures in it.

Then, it is a good reason to count how many substructures in a structure (Bonchev & Rouvray, 2003). The complexity C of a discrete random variable X is defined to be the function which counts the number of distinct patterns in X. The complexity C of n discrete random variables X1;X2; . . . ;Xn is defined as the function which counts the number of distinct patterns in joint pattern of X₁, X₂, . . . , X_n. For a finite number of random variables X₁, X₂, . . . , X_n, the complexity is finite. Thus, C(X1, X2, . . . , Xn) always can be nor- malized to be 1. Moreover, it is very natural to defined C(/) to be zero, where / is an empty set. By using the idea of complexity C to calculate the complexity of the subsets of criteria of N, define C1as the following: C1ðSÞ ¼_CðNÞ^CðSÞfor all S # N. It is easy to check that C1 has property of monotonicity. That is, X # Y implies C1ðXÞ 6 C1ðYÞ for X; Y 2 2^N. In addition, C1(/) = 0. By the definition of fuzzy measure, C1is a fuzzy measure.

Let the number of level to be 2 and S contain only two random variables X₁and X₂. By using the raw data fromTable 3, the raw data can be classiﬁed by histogram equalization of ‘‘hist.m” program of Matlab 7.0 for each random variable, as shown inTable 4. To generate the complete information of fuzzy measurem1, compute C(N). FromTable 4, there are two different joint patterns, i.e., (1, 2) and (2, 2). Thus, the complexity of N is 2. Next, C(S) is computed when S = X1and X2. That is, compute C(X1) and C(X2). There are two different patterns in X1. Then, C(X1) = 2. Moreover, there are only 1 pattern in X2, i.e., C(X2) = 1. By C1ðSÞ ¼_CðNÞ^CðSÞ for all S # N, the fuzzy measure C1is completely deﬁned by the following Table 6. Although our example is to compute the fuzzy measure of a random vector with two discrete random variables, the complexity method is also quite easy to compute the fuzzy measure of a random vector with more than two discrete random variables.

a Basic Competence Test to evaluate the students’ performance.

3. A procedure of using the discrete Choquet integral

A ﬁve-step procedure of applying the Choquet integral based on Calvo, Kolesarova, Komornikova, and Mesiar (2001)is as follows:

Step 1. Decide the range of level to be used to classify the raw data into the level of the score for each criterion in our study by Scott’s rule and Sturge’s formula. Assume that m is the number of the level of scores and m = 2, 3, 4, 5, 6, 7, 8, 9 are the range in our study. Then, transform the scores of the raw data into the level of the scores for each item when m = 2, 3, 4, 5, 6, 7, 8, 9.

Step 2: Check the mutual interaction and the strength of interaction among criteria. First, calculate the Chi-square divergence between a pair of criteria, and use statistical test to determine if there is any mutual interaction among the criteria for each m = 2, 3, 4, 5, 6, 7, 8, 9. For the analysis of correlation, we chose Cramer’s coefﬁcients to determine if there is strong mutual interaction among criteria. Compute Cramer’s coefﬁcients for each m = 2, 3, 4, 5, 6, 7, 8, 9. Note that if there is no interaction among criteria, we expect that the accuracy of the Choquet integral method is as well as that of weighted arithmetic mean method.

Step 3. For each m make the following calculations: (1) use credit hours to get the weight for each course; (2) use regression method to get the weight for each course; (3) by using the results from Step 2, compute fuzzy measures based on entropy and joint entropy for each subset of all courses. Then, the importance for each subset is resolved; (4) use the results from Step 2, compute fuzzy measures based on the complexity for each subset of all courses. Thus, the importance for each subset is available.

Step 4: Calculate the weighted arithmetic mean and regression methods among all courses from the raw data. Later, transform the results into the level of the scores for each course when m = 2, 3, 4, 5, 6, 7, 8, 9. Use the Choquet integral with the entropy method and the complexity-based method to compute overall performance values discussed in Step 3 for each m = 2, 3, 4, 5, 6, 7, 8, 9. Finally, transform the results into the level of the scores for each m = 2, 3, 4, 5, 6, 7, 8, 9.

Step 5: Calculate the accuracy for each method for each m = 2, 3, 4, 5, 6, 7, 8, 9.

4. A case study

A data set comes from a class with 45 students in a junior high school, and each student took three courses (namely physics and chemistry, biology, and geoscience) for natural science. The credit hours for these three courses are 16, 4, and 4, respectively. The maximum score for each course is 100 points. Later, all students took a Basic Competence Test for all junior high school students.

The maximum and minimum scores of the Basic Competence Test are 60 and 1, respectively. To simplify the notations, physics and chemistry, biology, and geoscience are denoted by C1, C2, C3, while the score of the Basic Competence Test is denoted by Obj. The de- tailed information is depicted inTable 7.

Table 5

A fuzzy measure constructed by the entropy method

X1 X2 Fuzzy measurem1

0 0 0/0.9710 = 0

1 0 0.9710/0.9170 = 1

0 1 0/0.9710 = 0

1 1 0.9710/0.9710 = 1

Table 6

A fuzzy measure constructed by the complexity method

X1 X2 Fuzzy measure C1

(13)

(Scott, 1979). In practice,ris replaced by the estimated standard deviation, s. In our study, the sample of size n is 45. From the raw data, R = 35, 26, 28, and 45 for each item and s = 8.4887, 6.7182, 7.6480, and 10.7691, respectively. By the above formula, m would be 4.2021, 3.9443, 3.7313, and 4.2587, respectively. Thus, m = 4 or 5 are possible candidates. The other one is the Sturge’s formula: m = 1 + 3.3 * log10(n) (Scott, 1992). From the latter formula,

tions at significant level of 0.01 among courses and observe the strength of mutual interactions among courses. First, use the results in Step 1 and ‘‘crosstab.m” program of Matlab 7.0 to compute the corresponding p-values and Chi-square divergence between a pair of criteria for each m = 2, 3, 4, 5, 6, 7, 8, 9. Later, compute Cramer’s correlation coefficient by using Chi-square values by the following formula: G ¼ ffiffiffiffiffi

v² nL

q

, where n = 45 and L = m 1 for each m = 2, 3, 4, 5, 6, 7, 8, 9. From p-values under m = 2, 3, 4, 5, 6, 7, 8, 9, summarized inTable 8, clearly there exist mutual interactions at signiﬁcant level of 0.01 among courses when m = 2, 3, 4, 5, 6, 7, 8 except m = 9. From Cramer’s correlation coefﬁ- cient in Table 8, we know the strength of mutual interactions among courses is stronger. Thus, we expect the accuracy of the Choquet integral method is better than those of weighted arithmetic mean and regression methods when m = 2, 3, 4, 5, 6, 7, 8.

The third step is to calculate the importance for each course by weighted arithmetic mean and regression methods, and the results are summarized inTable 9. FromTable 9, C1(physics and chemistry) has the highest importance than C₂(biology) and C₃(geoscience) by the weighted arithmetic mean method, i.e., C1> C2= C3. In contrast to the weighted arithmetic mean method, the regression method shows different importance as follows: C1> C3> C2. That is, it suggests that the class needs to put more efforts on geoscience to improve the score on the Basic Competence Test. For the evaluation of the Choquet integral with the entropy-based and the complexity-based methods, calculate the importance for each subset generated by all courses for m = 2, 3, 4, 5, 6, 7, 8, 9. The numer- ical ﬁgures of fuzzy measures for each subset are computed by Matlab and provided inTable 10. FromTable 10, the importance of complexity-based method is larger than that of entropy-based method for each subset of all criteria. This means that the importance of entropy-based method is underestimated. The reasons may come from the error of estimating a population probability by a small sample of size 45.

The fourth step is to compute the overall performance of students by the four methods. For each student, the overall performance and the score of the Basic Competence Test are transformed into the level of the scores for each item, as shown inTable 11, where M1, M2, M3, and M4 represent the weighted arithmetic mean method, the regression method, the Choquet integral with the entropy-based method, and the Choquet integral with the complexity-based method, respectively. The different numeri- cal ﬁgures in the Choquet integral column depicted in Table 11 have different meanings. The higher the value of Choquet integral is, the better it is. Finally, the ﬁfth step is to compare the predic- tions of different methods under different m, depicted inTable 12, where higher value means better accuracy. Obviously, the Choquet integral with the complexity-based method has the best accuracy among the four methods. The reasons may be that to estimate a population by the sample probability is worsen when m is greater than 4. It is worth to note that the regression method has better accuracy than the weighted arithmetic mean method since the regression method is to minimize the error without the assumption of mutual interaction among courses.

2 70 85 86 42 27 88 84 80 35

3 65 85 84 33 28 55 65 60 5

4 75 91 85 25 29 78 85 75 27

5 75 80 82 27 30 72 84 78 47

6 68 75 76 33 31 64 76 70 27

7 70 77 72 35 32 60 70 65 20

8 80 78 70 31 33 69 80 70 35

9 83 81 85 50 34 66 78 66 17

10 75 79 83 31 35 62 70 66 13

11 62 74 68 35 36 61 72 65 28

12 68 74 80 30 37 68 74 71 11

13 77 85 81 37 38 53 65 59 9

14 66 76 74 29 39 67 70 64 36

15 78 88 83 31 40 59 65 68 16

16 57 67 62 15 41 74 82 75 49

17 56 70 63 12 42 58 66 62 15

18 68 80 74 31 43 76 74 78 38

19 53 66 58 21 44 84 81 78 37

20 65 81 73 32 45 76 72 74 35

21 62 76 69 12

22 67 75 71 22

23 74 71 68 28

24 61 69 65 28

25 64 70 67 24

Table 8

The results of Cramer’s correlation coefﬁcients

C1 C2 C3 C1 C2 C3

m = 2 m = 3

C1 1 0.5307 0.5737 1 0.5437 0.5284

C2 0.5307 1 0.7441 0.5437 1 0.6765

C3 0.5737 0.7441 1 0.5284 0.6765 1

p < 0.01 p < 0.01

m = 4 m = 5

C1 1 0.5097 0.5744 1 0.5131 0.5885

C2 0.5097 1 0.6026 0.5131 1 0.5583

C3 0.5744 0.6026 1 0.5885 0.5583 1

p < 0.01 p < 0.01

m = 6 m = 7

C1 1 0.4848 0.5537 1 0.4991 0.537

C2 0.4848 1 0.6212 0.4991 1 0.5821

C3 0.5537 0.6212 1 0.537 0.5821 1

p < 0.01 p < 0.01

m = 8 m = 9

C1 1 0.515 0.5329 1 0.5164 0.5375

C2 0.515 1 0.5336 0.5164 1 0.5049^*

C3 0.5329 0.5336 1 0.5375 0.5049^* 1

p < 0.01

*p > 0.01.

Table 9

Weights for each course by the weighted arithmetic mean and regression methods

行政院國家科學委員會專題研究計畫 成果報告

模糊測度 Choquet 積分應用於教育測驗分析之研究(I) 研究成果報告(精簡版)

中 華 民 國 98 年 10 月 30 日

成果報告(精簡版)

（2008/08/01~2009/07/31）

計畫類別： ■個別型計畫 □整合型計畫 計畫編號： NSC 97-2413-H-468-014-

執行期間： 97 年 08 月 01 日 至 98 年 07 月 31 日 計畫主持人： 劉湘川 亞洲大學生物資訊學系暨心理學系 共同主持人： 郭伯臣 國立臺中教育大學教育測驗統計研究所

計畫參與人員：

專任助理人員:

林莞惠 靜宜大學應用數學系統計資訊組 兼任助理人員:

劉育隆 亞洲大學資訊工程學系博士班

杜雨潔 國立台中教育大學教育測驗統計研究所博士班 林士勛 國立台中教育大學教育測驗統計研究所

成果報告類型（依經費核定清單規定繳交） ：精簡報告

報告附件： 出席國際學術會議心得報告及發表論文各一份

處理方式：本研究計畫涉及專利或其他智慧財產權，兩年後可公開查詢

執行單位：亞洲大學

[

)

[

)

2008 International Conference on Wavelet Analysis and Pattern Recognition. (Hong

paper)

[

)

[

)

1. Hsiang-Chuan Liu, “A theoretical approach to the completed L-fuzzy

Quantitative Analysis Techology and Related Engineering Applications, pp. 1121-1124,

[

]

[

]

[

)

[

)

ISSN: 1991-8763.

(EI Journal)

2008 International Conference on Wavelet Analysis and Pattern Recognition. (Hong

paper)

Proceedings of 2008 International Conference on Wavelet Analysis and Pattern Recognition. (Hong Kong, 30-31, Aug. 2008.) .Volume: 2, pp.771-776. ISBN:

本研究計畫第一年度經數理分析之探討，提出一種有效之模糊密度；γ支撐，四種改善之多

Choquet 積分迴歸模式，包含複迴歸預測模式及脊迴歸預測模式之電腦分析系統程式設計、進行

行政院國家科學委員會專題研究計畫成果報告

中華民國 98 年 10 月 30 日

計畫類別： ■個別型計畫 □整合型計畫計畫編號： NSC 97-2413-H-468-014-

執行期間： 97 年 08 月 01 日至 98 年 07 月 31 日計畫主持人：劉湘川亞洲大學生物資訊學系暨心理學系共同主持人：郭伯臣國立臺中教育大學教育測驗統計研究所

林莞惠靜宜大學應用數學系統計資訊組兼任助理人員:

劉育隆亞洲大學資訊工程學系博士班

杜雨潔國立台中教育大學教育測驗統計研究所博士班林士勛國立台中教育大學教育測驗統計研究所

成果報告類型（依經費核定清單規定繳交）：精簡報告

報告附件：出席國際學術會議心得報告及發表論文各一份