Inexact Reasoning
• Certainty factors
• Dempstershafer theroy
• Zadeh’s fuzzy theory
Uncertainty
• The lack of adequate information to make a
decision
• Methods
1. Probability Bayes’ Theorem
2. Certainty Factor
3. DempsterShafer Theory
Why Certainty Factors
_{1}
• Difficulties with Probability
• In Bayesian approach, usually impossible to
determine all priori and posterior probabilities
P(D  E) = =
– E: evidence – D: diseases ) ( ) ( )  ( E P D P D E P∑
j Dj P Dj E P D P D E P ) ( )  ( ) ( )  (Why Certainty Factors
_{2}
• Problem with belief & disbelief by MYCIN
domain experts
– The experts do not agree to the relation of belief and disbelief
P(H) + P(H’) = 1 ?
• Another example
– If this is the last course required to a degree
• P(graduate  ‘A’ in this course) = 0.7 • P(not graduate  ‘A’ in this course) = 0.3
Certainty Factors
_{1}
• Developed by MYCIN
• Based Carnap’s theory of confirmation
• Certainty Factor
– CF(H , E) = MB(H , E) – MD(H , E)
– CF is the certainty factor in the hypothesis H due to evidence E
– which is the difference between belief and disbelief – MB is the measure of increased belief in H due to E – MD is the measure of increased disbelief in H due to E
Certainty Factors
_{2}
• Belief 1 if P(H) = 1 MB(H , E) = else • Disbelief: 1 if P(H) = 0 MD(H , E) = else ) ( 1 ) ( )] ( ),  ( [ H P H P H P E H P Max − − ) ( 0 ) ( )] ( ),  ( [ H P H P H P E H P Max − −Usage
• CF > 0 Support the hypothesis
• CF < 0 Support the negation of the hypothesis
• CF = 0 (1) No evidence (2) MB = MD
Characteristics
Characteristics
Values
Ranges
0 ≤ MB ≤ 1 , 0 ≤ MD ≤ 1 ,1 ≤ CF ≤ 1
Certain True Hypothesis
P(H  E) = 1
MB = 1 , MD = 0 , CF = 1
Certain False Hypothesis
P(H’  E) = 1
MB = 0 , MD = 1 , CF = 1
Meaning
• Only the difference of belief and disbelief is
important
e.g. CF = 0.7 = 0.7 – 0 = 0.8 – 0.1
• CF(H , E) + CF(H’ , E) = 0
– different from probability
– e.g. CF(H , E)=0.70, CF(H’ , E) =0.70
I am 70% certain that I will graduate if I get an ‘A’ in this course
Improvement
_{1}
• Problem: one piece of evidence could control all
other pieces of evidences
• CF(H , E) = MB(H , E) – MD(H , E)
e.g. MB = 0.999 for 10 evidences
MD = 0.799 for 1 evidence
CF = 0.999 – 0.799 = 0.2
• But in MYCIN: a rule’s antecedent CF must be >
0.2 in order to be activated
Improvement
_{2}
• Change of CF definition:
CF =
• e.g. MB = 0.999
MD = 0.799
CF = 0.995
rule will be activated
)
,
min(
1
MB
MD
MD
MB
−
−
Combining Evidences
• E1 and E2: min[CF
_{E1}, CF
_{E2}]
• E1 or E2: max[CF
_{E1}, CF
_{E2}]
• NOT E1: CF
_{E1}• Example :
E = (E1 and E2 and E3) or (E4 and not E5)
• Rules
Example
E = (E1 and E2 and E3) or (E4 and not E5)
E1 = 0.9
E2 = 0.8
E3 = 0.3
E4 = 0.5
E5 = 0.4
E = max[min(0.9,0.8,0.3), min(0.5,(0.4))]=0.3
Rule Certainty
• IF E THEN H
[CF(H,E)]
partial evidence e
CF(H , e) = CF(E , e) * CF(H , E)
• Example:
CF (E , e) = 0.3
CF (H , E) = 0.7
Combining Conclusions
• The same conclusion is derived by different evidences • Example : IF E1 THEN A CF1 IF E2 THEN A CF2 • Combining Function CF1 + CF2 * (1 – CF1) both > 0 CF = one < 0 CF1 + CF2 * (1 + CF1) both < 0
)
CF2

,

CF1
min(
1
CF2
CF1
+
Example
CF1 = 0.21
CF2 = 0.5 CF = 0.605
If, third rule with same conclusion:
CF3 = 0.4 CF = 0.34
• Commutativity
CF(X , Y) = CF(Y , X)
MYCIN stores current CF
_{COMBINE}
Rule 1
CF_{1}(H,e) = CF_{1}(E,e) CF_{1}(H,E)
NOT (–) OR (max) AND (min) Hypothesis , H Rule 2
CF_{2}(H,e) = CF_{2}(E,e) CF_{2}(H,E)
NOT (–) OR (max) AND (min)
Advantages
• Simple computation
• Easy understanding
Disadvantages
• CF values could be the opposite of conditional probabilities
e.g.
P(H1) = 0.8 P(H2) = 0.2 P(H1  E) = 0.6 P(H2  E) = 0.4 CF(H1  E) = 1 CF(H2  E) = 0.25A higher conditional probability and a lower certainty factor Contradiction
• In general: P(H  e) ≠ P(H  i) * P(i  e)
• But in MYCIN: CF(H  e) = CF(H  i) * CF(i  e) Only suitable for short inference chains
DempsterShafer Theory
• Model uncertainty by a range of probability
e.g. [0.9~0.95]
• Has a good theoretical foundation
• Suitable for evidential reasoning
• Deal with information that is expected to be
uncertain,imprecise, and inaccurate
Environment
• Environment: a term in DS Theory
• A fixed set of elements: mutually exclusive and
exhaustive:
– university of discourse in set
• Example:{airliner,bomber,fighter} or
{red,green,blue,orange,yellow}
• Each subset of the environment is a possible
answer to a question
– e.g.”What is the military aircraft” {bomber,fighter} } ..., , {Θ_{1} Θ_{2} Θ_{N} = Θ
All Subsets
• Power set of Θ
• Example:{
ψ,{A},{B},{F},{A,B},{A,F},{B,F},{A,B,F}}
{B , F} {F} {B} {A , F} {A , B} {A} Θ={A , B , F}Mass Function
• Each subset is assigned a mass
• Which is used to measure the degree of belief in
evidence
• New evidence comes Mass changes
• Example: Aircraft identification
m({F}) = 0.5 m({B}) = 0.3 m({B , F}) = 0.2
• Other name:
Basic Probability Assignment (bpa)
Basic Assignment
Nonbelief and Disbelief
• Disbelief: refute a hypothesis
e.g. {B , F} {A}
• DS Theory adopts Nonbelief
• Nonbelief (or no belief): no evidence
e.g. {B , F} {A , B , F}
Characteristics
• m(subset) [0 , 1]
• m(
ψ
) = 0
• Sum of all m(subset) = 1
• All nonbelief is assigned to the set of the
environment
• Example: m({B , F}) = 0.2
D – S and Probability
DempsterShafer Theory
Probability Theoty
m(Θ) does not have to be 1
If X⊆Y , it is not necessary that m(X) ≤ m(Y)
No required relationship between m(X) and m(X’)
P(X)≤ P(Y)
P(X) + P(X’) = 1
1 =∑
i i PCombining Evidences
• Orthogonal Sum: ⊕ • Example m1({B , F}) = 0.7 m1({A , B , F}) = 0.3 m2({B}) = 0.9 m2({A , B , F}) = 0.1 m2({B}) = 0.9 m2({A , B , F}) = 0.1 m1({B , F}) = 0.7 {B} 0.63 {B , F} 0.07 m1({A , B , F}) = 0.3 {B} 0.27 {A , B , F} 0.03 m3({B}) = 0.63 + 0.27 = 0.9 m3({B , F}) = 0.07 m3({A , B , F}) = 0.03 [nonbelief]Evidential Interval
• A range of belief
• Example : m3({B}) = 0.9
m3({B , F}) = 0.07 m3({A , B , F}) = 0.03
Belief of {B} = [0.9 , 1.0]
• 0 <= Bel <= Pls <= 1
lower bound or support (Spt) or (Bel) upper bound or plausibility (Pls)Common Evidential Intervals
Evidential Interval
Meaning
[1,1] [0,0] [0,1]
[Bel,1] where 0 < Bel < 1 here [0,Pls] where 0 < Pls < 1 here
[Bel,Pls] where 0 < Bel < Pls < 1 here
Completely true Completely false Completely ignorant Tends to support Tends to refute
Belief Function
• The sum of beliefs of a set and all its subsets
• Example:
Bel({B , F}) = m({B , F}) + m({B}) + m({F})
m({F}) = 0.5
m({B}) = 0.2
m({B , F}) = 0.2
Bel({B , F}) = 0.9
Combining Belief
• Bel
_{1}⊕
Bel
_{2}({B , F})
= m
_{1}⊕
m
_{2}({B , F}) +
m
_{1}⊕
m
_{2}({B}) +
m
_{1}⊕
m
_{2}({F}), null sets are zero
• Bel
_{1}⊕
Bel
_{2}({A , B , F}) = 1
Evidential Interval
• EI(S) = [Bel(S) , 1 – Bel(S’)]
m3({B}) = 0.9 S = {B , F}
m3({B , F}) = 0.07 S’ = {A} m3({A , B , F}) = 0.03
Bel({B , F}) = 0.9 + 0.07 = 0.97 Bel({A}) = 0 EI(B , F) = [0.97 , 1  0] = [0.97 , 1]
[total belief, plausibility] can be expressed as
Normalization of Belief
• Example :
m_{1}({A}) = 0.95[new evidence] m_{2}({B}) = 0.9
m_{1} ({A , B , F}) = 0.05 m_{2} ({B , F}) = 0.07 m_{2} ({A , B , F}) = 0.03 m_{3}({A}) = 0.0285 m_{3} ({B}) = 0.045 m_{3} ({B , F}) = 0.0035 m_{3} ({A , B , F}) = 0.0015 m_{3} (ψ) = 0.9215