• 沒有找到結果。

不明確資料之統計分析與軟計算方法

N/A
N/A
Protected

Academic year: 2021

Share "不明確資料之統計分析與軟計算方法"

Copied!
8
0
0

加載中.... (立即查看全文)

全文

(1)

行政院國家科學委員會專題研究計畫 成果報告

不明確資料之統計分析與軟計算方法

計畫類別: 個別型計畫 計畫編號: NSC92-2213-E-004-003- 執行期間: 92 年 08 月 01 日至 93 年 07 月 31 日 執行單位: 國立政治大學應用數學學系 計畫主持人: 吳柏林 報告類型: 精簡報告 處理方式: 本計畫可公開查詢

中 華 民 國 93 年 9 月 6 日

(2)

不明確資料之統計分析與軟計算方法

計畫編號: NSC92-2213-E-004-003

吳柏林 國立政治大學應用數學系

2004/8/12

(3)

Introduction

First, we present the evidence that other non-probabilistic uncertainty, such as belief functions ([20]) and possibility distributions ([4], [24]) can be expressed in terms of random sets. Note that this point of view has been adopted in much current research.

(a) Belief Functions: Belief functions are generalizations of degrees of belief in a Bayesian framework where beliefs are subjective probabilities. They are essentially perception-based information. Belief functions are acclimatized by Shafer as follows.

Let U be a finite set, say, U= {u u1, 2,...,uk}. A set-function F is called a belief function

if F:2U →[0,1] such that: (i)F (φ) =0, F (U) =1

(ii)F is monotone of infinite order, i.e. for any n≥1 and AiU i, =1, 2,...,n

| | 1 {1, , } 1 (n i) ( 1)I ( j) I n i j I F A F A φ + ≠ ⊆ = ≥ ∑L − ∈ U I

Remark: (ii) is simply the weaken form of Poincare's equality for probability measures. Clearly F is not additive.

But then the set-function f: 2UR , ( ) ( 1)A B ( )

B A f AF B ⊆ = ∑ − is non negative and ( ) 1 A f A ⊆Ω = ∑ ..

As such f(.) is a Borel field probability mass (density) function on 2U. More specifically, there is a random element (or formally, a "random set") S: ( , , )U A P →2U

such thatP S( = A)= f A( ), and F A( )= p S( ⊆A)

For such S, the belief function F is nothing else than the counter-part of the distribution

function of random variable. Indeed, by Mobius inversion on the finite poset (2 ,U ⊆), we have ( ) ( ) A B f B f A ⊆ = ∑ =p S( ⊆B) Theorem : T K( )=PS(FK, KK).

Example. Letϕ: Rd →[0,1] be a possibility distribution which is semi-upper-continuous (s.u.c.), then S( )ω = ∈x Rd; ( )ϕ x ≥α ω( ), where is a random variable, uniformly distributed on [0,1] , is a random closed set on d

R , whose capacity functional is

( ) sup( ( ) : )

T K = ϕ x xK .

Thus, a possibility measure generates the probability law of a random closed set.

Remark. The above capacity functional is of a very special kind, namely it is maxitive , i.e. for any A, B in K, T A( UB)=max (T(A),T(B)). It can be shown that any set-function which is maxitive is necessarily alternating of infinite order. Thus, we can look for models for random sets among such set-functions. Maxitive set-functions appear in many places, such

(4)

as Kuratowski's measure of non-compactness, all fractal dimensions, and as limits of probability measures in the sense of Large Deviation Principle sense. Here is an interesting example : Let fL∞( , , )Ω A P , i.e. 0 inf : ( : ( ) ) a f P ω f ω a ≥ = > < ∞ ||f||. Since P is a finite

measure, it follows that p>0, fL∞( , , )Ω A P . Consider the sequence of probability measures

( ) n n A P A = ∩Ω f dP Then, 1/ ( ) n ( ) / ) / n A n n A

P A = f ff f as n → ∞. But it can be checked that the set-function τ (A)= fA is maxitive. Thus, in a sense possibility measures are LDP limit of

probability measures. This fact is pointed out in [21] where possibility measures are called

idempotent probabilities . Possibility measures are limits of probabilities in the LDP sense.

This fact not only puts the calculus of possibilities on a firm basis for deriving deduction rules, but also testifies that possibility measures are relevant for studying "rare events" which are important in decision-making.

Possibility distributions can also be viewed as Radon-Nikodym derivatives of possibility

measures : Let the reference possibility measure νo be 1 ( ) 0 o if A A if A φ ν = ≠=φ

and consider the Choquet integral of a measurable non-negative function f with respect to a

non-decreasing set-function µ(c)A fdµ=0∞[ (µ ft)I A dt] , then the possibility measure π associated with the possibility distribution ϕ is written as π( )A =( )cAϕ νd o so that

/ o

d d

ϕ= π ν .

This background is useful for decision analysis, for example, in designing intelligent decision-support systems to assist decision-makers to choose options for actions, we analyze a

multi-criteria decision-making problem, in order to propose some appropriate aggregation operators to summary local information from profiles or observed data leading to some

ranking procedure necessary for selections. This can be achieved by the aggregation operator based on Choquet integral of the profiles with respect to a possibility measure, generalizing standard criterion of using expected values in statistical decision theory. The novel aspect of this approach is that it does take into account the interactions between criteria, in the spirit of

coalition games. We intend to pursue this research issue for applications.

Statistical Analysis with coarse data.

The analysis of perception-based information for decision-making is not an isolated area. Indeed, rooted in a realistic situation of statistics, mainly in Biostatistics, where observed data are in general of low quality (coarse data) , expressing a form of incomplete, vague data, these

(5)

coarse data such as coarse probabilities (e.g. interval valued probabilities) are treated entirely within the framework of random sets since the unobservable are nothing else than selectors of the observables which are random sets. Thus, we intend to use also results from statistics of set-valued observations for the benefit of intelligent systems designs, especially in simulations of random sets leading to assessments of possibility distributions.

The following background on statistics with random sets will be useful in deriving simulations for random sets which, in turn, should provide guidelines for assessments of perception-based probabilities whose values are linguistics expressed as possibility distributions.

Inference with coarse data is needed when the values of the variable of interest are not observed, but instead, they are only known to be located in some sets, such as in inference with missing data in multivariate analysis, censoring data in survival analysis, or in grouped data, in general. A simple framework is this.

Let X:Ω →U(finite). A coarsening of X is a non-empty random set S on U (i.e. : 2U

S Ω → such that P X( ∈S)=1.

Note that S is a coarsening of X if and only if ( , ) 1

A x A

P X x S A ∈ = = =

∑ ∑

Also, if S is a coarsening of X , then necessarily the probability law PX has to belong to the

core of S . Specifically, let f denote the density of S , i.e. f(A)=P(S=A) , then the distribution

function F of S is ( ) ( ) ( )

B A

F A P S A f B

= ⊆ = ∑

Note that by Mobius inversion, ( ) ( 1)A B ( )

B A

f AF B

= ∑ − where |A| denotes the cardinality of the set A . The core of S , or more precisely the core of F, is

core(F)=probability measures P on U such that F(.)P(.)

Clearly, given X, there are many possible coarsening X. Among coarsening of X, there

are special ones which are "nice" for statistical inference about the unknown distribution PX

of X , when X cannot be directly observed. The following popular model for coarse data will be used in our research. A coarsening S is said to be a Coarsening At Random (CAR) if A∈2U, P(S=A|X=x) is constant in xA.

It can be shown that the statistical meaning of this property is : (A 2 \ )U

λ ∈ φ and xA, P(X=x|S=A)=P(X=x|XA)

i.e. as far as the distribution of the random variable X is concerned, it suffices to know that X falls into some know set A, in the likelihood inference, say. This popular model for coarse data has been used especially in making inference about the underlying variables of interest as well as providing simulated data. We intend to use this model of coarsening for empirical analysis of random set observations in the process of assigning possibility distributions (perception-based probabilities) to linguistic probability values.

(6)

In order to carry reasoning in decision systems, it is necessary to supply sufficient information to the knowledge base. For example, in simulations models for missions, various conditional probabilities (of rules) are needed. These are in general obtained by asking experts. Their perception-based probabilities are in general linguistic. According to Zadeh ([24]), linguistic probabilities are possibility distributions. As such, fuzzy logic operations could be used to manipulate them.

Now, from our approach to represent possibility distributions as covering functions of random sets, it is plausible that fuzzy logic can be combined with random sets to provide solid theoretical results for logical deductions. For example, when computing various logical expressions of perception-based probabilities, we first represent each such probability value as a possibility distribution which is viewed as the covering function of some random set. The natural logical connectives on random sets will suggest appropriate fuzzy logic operations, such as t-norms, t-conorms, implication operators,... , see e.g. [14], to use, as well as to prove their validity within a firm foundation of real analysis and measure theory, as in deriving probability logic in Artificial Intelligence, see e.g. [20]. We intend to pursue this important issue in this project.

Knowledge with uncertain conditional.

In our previous work ([20]), see also [8], [9], [10], a second-order probability approach (in a Bayesian spirit) has been applied to situations when prior probability distributions (e.g. Dirichlet distributions) are available. These works complement Bayesian networks when the conditional independence assumption of variables involved is not plausible. However, we did encounter cases where we need to reason with assertions with "high" probability, such as reasoning with rules which have exceptions.

This is a special case of linguistic perception-based probabilities. For example, from a knowledge rule base of the form " IfAithen BiwhereAiandBi are linguistic labels, modeled as fuzzy events or possibility distributions, and their reliability are quantified by conditional probabilities, one can assess the validity of new rule " C then D". We intend to extend our previous work to the realm of perception-based probabilities which present much more realistic information than prior probability distributions. In this context, we intend to develop a comprehensive theory of decision based on random set connection with possibility theory, and apply it to design of cognitive decision systems.

we have shown that for interval expert estimates, membership degree corresponding to the fuzzy approach can be reformulated in statistical terms. A natural question is: vice versa, can we reconstruct all the possibilities corresponding to the statistical approach from the fuzzy values µ

( )

xi ? In other words, if we know the values µ

( )

xi for i, can then uniquely reconstruct the probabilities p

(

[

x−,x+

]

)

of different intervals? In short, are the fuzzy and interval-valued statistical approaches truly equivalent? The general answer to this question is “No”:

(7)

for which µ1

( )

xi =µ2

( )

xi .

Proof. Both distributions correspond to two experts ( n =2) and a 4-element set

{

x1 x2 x3 x4

}

X = < < < .

In the first example, the first expert considers the interval

[

x1, x3

]

to be possible, and the second expert considers the interval

[

x2, x4

]

. Here, the probabilities of the interval

[

x1, x3

]

and

[

x2, x4

]

are equal to 0.5:p1

(

[

x1,x3

]

)

=p1

(

[

x2,x4

]

)

=0.5, while the probabilities of all other interval are equal to 0.

In the second example, the first expert considers the interval

[

x1, x4

]

(i.e., all possible values of x) to be possible, while the second expert considers the interval

[

x2, x3

]

. Here the probabilities of the intervals

[

x1, x4

]

and

[

x2, x3

]

are equal to 0.5: p2

(

[

x1,x4

]

)

= p2

(

[

x2,x3

]

)

, while the probabilities of all other intervals are equal to 0.

These two probability distributions are different: e.g., p1

(

[

x1,x3

]

)

=0.5≠0= p2

(

[

x1,x3

]

)

. On the other hand, for both distributions, we have the same membership function µ:

( )

1 2

( )

1 0.5

1 xx =

µ , µ1

( )

x22

( )

x2 =1, µ1

( )

x32

( )

x3 =1, and µ1

( )

x42

( )

x4 =0.5. The proposition is proven.

Conclusion

We have accomplished its objectives as follows.

(i) Theoretical results on using random sets as a corner stone in modeling various types of uncertainty and in inference with imprecise data,

(ii) Within such a framework, reasoning mechanisms will be rigorously developed leading to a firm decision theory for perception-based information, with applications to some cognitive decision systems, such as multi-criteria decision-support systems.

(iii) Cases studies on patients in the department of emergence and the tracking high speed

targets, we proposed in the project will be not only important in the academic field but

also in the practical applications. It will help people to take more information into consideration when they facing the important decision problems .

Reference

[1] J. M. Blin, “Fuzzy relations in group decision theory”, J. of Cybernetics, 1974, Vol.4, pp. 17-22.

[2] Berthold, M. and Hand, D.J. (Editors) (1999) Intelligent Data Analysis. Springer-Verlag. [4] J. Goutsias, R. P. S. Mahler, and H. T. Nguyen (eds.), Random sets: Theory and

Applications, Springer-Verlag, N.Y., 1997.

[5] Goodman, I.R., Mahler, R. and Nguyen, H.T. (1997) Mathematics of Data Fusion. Kluwer Academic Press.

[6] G. Klir and B. Yuan, Fuzzy sets and fuzzy logic: theory and applications, Prentice Hall, Upper Saddle River, NJ, 1995.

[7] H. T. Nguyen, “Some mathematical tools for linguistic probabilities”, Fuzzy Sets and

(8)

[8] B. Wu and W. Yang, “application of fuzzy statistics in the sampling survey”, In:

Development and Application for the Quantity Methods of Social Science, Academic

Sinica, Taiwan, pp. 289-316.

[9] Goutsias, J., Mahler, R. and Nguyen, H.T. (1997) Random Sets : Theory and Applications. IMA Volumes in Mathematics and its Applications, no 97, Springer-Verlag.

[7] Haddawy, P. (1994) Representing Plans under Uncertainty. Lecture Notes in Artificial Intelligence, No 770, Springer-Verlag.

[10] Hall, J. and Lawry, J. (2001) Imprecise probabilities of engineering system failure for

random and fuzzy set reliability analysis. Proceedings 2nd Symposium on Imprecise

Probabilities and Their Applications , Itaca, N.Y. , 195-294.

[11] Lehmann, D. and Magidor, M. (1992) What does a conditional knowledge entail ?

Artificial Intelligence (55), 1-60.

[12] Lukasiewicz, T. (2002) Probabilistic default reasoning with conditional constraints.

Annals of Mathematics and Artificial Intelligence (34), 35-88.

[13] Nguyen, H.T. and Bouchon-Meunier, B. (2002) Random sets and large deviations

principle as a foundation for possibility measures.

[14] Nguyen, H.T., Prasad, N., Walker, C. and Walker, E.A. (2002) A First Course in Fuzzy

and Neural Control. Chapman and Hall/CRC Press, to appear Nov. 2002.

[15] Nguyen, H.T. (2000). Some mathematical structures for computational information.

Journal of Information Sciences (128), 67-89.

[16]Nguyen, H, Wang, T. and Wu, B. (2003). On probabilistic methods in Fuzzy Theory.

International Journal of Intelligent Systems. (will appear)

[17] Terano, T. et al (2001) New Frontiers in Artificial Intelligence. Springer-Verlag.

[18] Wu, B. and Sun, C. (2001). Iinterval-valued statistics, fuzzy logic, and their use in computational semantics. Journal of Intelligent and Fuzzy Systems. 11,1-7.

[19] Wu, B. and Tseng, N. (2002). A New Approach to Fuzzy Regression Models with Application to Business Cycle Analysis. Fuzzy Sets and System. 130, 33-42.

[20]Wu, B. and Hsu, Yu-Yun. (2003). The Use of Kernel Set and Sample Memberships in the Identification of Nonlinear Time Series. Soft Computing Journal. (will appear).

[21] Walley, P. (1987) Statistical Reasoning with Imprecise Probabilities. Chapman and Hall. [22] Zadeh, L.A. (2002) Toward a perception-based theory of probabilistic reasoning with

參考文獻

相關文件

3.1(c) again which leads to a contradiction to the level sets assumption. 3.10]) which indicates that the condition A on F may be the weakest assumption to guarantee bounded level

In the past researches, all kinds of the clustering algorithms are proposed for dealing with high dimensional data in large data sets.. Nevertheless, almost all of

自己設計 random function 自己設計 random function... 自己設計 random function 自己設計

自己設計 random function.. 自己設計

自己設計 random function.. 自己設計

• Guidelines can help commissioners and purchasers to make informed decisions and provide managers with a useful framework for assessing treatment costs...

Using sets of diverse, multimodal and multi-genre texts of high quality on selected themes, the Seed Project, Development of Text Sets (DTS) for Enriching the School-based

For the data sets used in this thesis we find that F-score performs well when the number of features is large, and for small data the two methods using the gradient of the