Advances in applying stochastic-dominance relationships to bounding probability distributions in Bayesian networks

(1)

Advances in Applying Stochastic-Dominance Relationships to Bounding

Probability Distributions in Bayesian Networks

Chao-Lin Liu

Department of Computer Science National Chengchi University

Taipei 11605, Taiwan chaolin@nccu.edu.tw

Abstract

Bounds of probability distributions are useful for many reasoning tasks, including resolving the qualitative ambi-guities in qualitative probabilistic networks and search-ing the best path in stochastic transportation networks. This paper investigates a subclass of the state-space ab-straction methods that are designed to approximately evaluate Bayesian networks. Taking advantage of par-ticular stochastic-dominance relationships among ran-dom variables, these special methods aggregate states of random variables to obtain bounds of probability dis-tributions at much reduced computational costs, thereby achieving high responsiveness of the overall system.

The existing methods demonstrate two drawbacks, however. The strict reliance on the particular stochastic-dominance relationships confines their applicability. Also, designed for general Bayesian networks, these methods might not achieve their best performance in spe-cial domains, such as fastest-path planning problems. The author elaborates on these problems, and offers ex-tensions to improve the existing approximation tech-niques.

Keywords: Bayesian Networks, Stochastic Domi-nance, Approximate Reasoning

1. Introduction

In the past decade, Bayesian networks have become a major formalism for capturing and reasoning about un-certainty in complex applications [6]. A Bayesian net-work encodes, respectively, qualitative and quantitative probabilistic relationships among random variables in terms of a directed acyclic graph and conditional prob-ability tables [8, 17]. Given observations about some random variables, we evaluate the Bayesian network to obtain the conditional probability distributions of ran-dom variables of interest. The evaluation process is also known as inference in Bayesian networks, and this ac-tive research field has seen a wide variety of approaches for computing exact and approximate probability distri-butions. Approximation algorithms allow us to obtain useful information about the desired probability distri-butions at reduced computational costs when specific

This paper appeared in the Proceedings of the IASTED

Interna-tional Conference on Artificial and ComputaInterna-tional Intelligence 2002,

251-256. Tokyo, Japan, 25-27 September 2002.

application-dependent constraints do not permit exact in-ference. D’Ambrosio offers a very informative survey in [3], and some recent developments include [10, 15, 16].

We can classify approximate inference procedures from different perspectives. In terms of how we carry out the approximations, D’Ambrosio comes up with two schools of algorithms: approximate inference methods compute distributions with special algorithms using the original network, e.g., [2, 18], and model reduction

meth-ods employ exact algorithms after simplifying the

origi-nal network, e.g., [14, 20]. Classifying in terms of types of the outcomes of approximation procedures, we see that some algorithms compute the upper and/or lower bounds, e.g., [4, 7], while others compute point-valued approximations of the desired probability distributions [2, 20].

In the following figure, let E be the curve of the ex-act cumulative distribution function of a random variable

X. Approximate algorithms may compute the upper and

lower bounds, U and L, respectively, or the point-valued approximation, A, of E. The curve L is called a lower bound because it suggests the distribution of the random variableXtilts to its lower range, although geometrically

L locates on the upside of the curve E.

8 / $ ( [ )[

Figure 1. Approximations of the exact distribu-tion E

The state-space abstraction (SSA) methods com-pute approximate probability distributions by first sim-plifying the given Bayesian network. Depending on how we simplify the networks, we can compute either the point-value approximations[14] or bounds of the desired probability distributions [12]. For computing the bounds, the SSA methods require that the underlying conditional distributions encoded in the Bayesian networks exhibit the stochastic dominance property [23]. Although this property may hold for some applications, there are appli-cations in which this assumption is slightly violated. In this paper, I discuss how we can revise the original SSA methods and expand their applicability into this arena. Also, hoping to find tighter bounds, the SSA methods employ heuristics for selecting the abstract model of the

(2)

original network such that we obtain the best approxi-mations possible. It is, however, very difficult to design heuristics good for all possible probability distributions. This paper reports a new heuristic tailored for fastest path planning problems [13].

The following section presents details of stochastic dominance and its applications with the state-space ab-straction methods. Section 3. investigates the situations in which the requirement for stochastic dominance can be relaxed, and provides the revised SSA methods. Sec-tion 4. discusses and proposes a new strategy for obtain-ing the best approximations for applyobtain-ing the SSA meth-ods to fastest path planning problems. Section 5. pro-vides an outline of applications of the new methods, and Section 6. concludes this paper with a brief summary.

2. Background

In this paper, we use capital and small letters to denote random variables and their values, respectively. The pos-sible values of a random variable are also called states of the random variable, and we use subscripts to them when necessary. Also, we adopt the shorthandPr(xjy;z)for

Pr(X=xjY =y;Z =z). Most of the time, we need to

refer to a set of random variables, and we use bold letters for sets. We also follow the tradition by calling a node

parent when it has outgoing arc to another node. The

node with an incoming arc is called a child of its parent.

2.1 Stochastic dominance

LetF 1

(X)andF 2

(X)be two possible cumulative

distri-bution functions (CDFs) of the random variableX.

Definition 1 ([21]) We say that F 1 (X) first-order stochastically dominates F 2 (X) if and only if F 1 (x) F 2

(x) holds for all x , and we denote this relationship byF

1

(x)FSDF 2

(x).

Such a stochastic dominance relationship is used ex-tensively in defining qualitative probabilistic networks (QPNs) in [21]. In a sense, QPNs are special Bayesian networks that take advantage of the dominance relation-ships to design efficient inference procedures for qualita-tive relationships. Wellman defines that a nodeX posi-tively influences its childY if and only if the following

inequality holds for allx i,

x

j, and values of the rest of Y’s parentspx(Y). x i x j )F(yjx j ;px(Y))FSDF(yjx i ;px(Y)) (1)

An interpretation of this inequality is that, all else being equal, increasing the value ofX increases the

probabil-ity ofY being a larger value as is shown in the following

figure. In transportation problems, this is tentative to say that leaving the origin at a later time increases the prob-ability of arriving at the destination later. Analogously, we say thatX negatively influencesY if the dominance

relationship reverses in (1).

Another implication of the dominance relationship

F (x)FSDF (x)is that the following relationship holds

)\_[MS[< )\_[LS[<

\

Figure 2.X positively influencesY.

for all monotonically increasing functiong(X)[23].

Z g(x)dF 1 (x) Z g(x)dF 2 (x) (2)

This inequality has been applied to fastest-path planning algorithms [22], resolution of ambiguous qualitative re-lationships in QPNs [11], and computing bounds of prob-ability distributions [12, 13].

2.2 State-space abstraction methods

A Bayesian network encodes the joint probability distri-bution of a set of nodes.

Each random variable can take on a set of possible values, and each of which is called a state of the random variable. It is well known that the computational costs for the evaluation of Bayesian networks increase exponentially with the cardinality of states of the random variables [1]. Hence, reduction of the cardinality emerges as an intuitive approach for computing the desired probability distributions at lower costs. The skeleton of the iterative state-space abstrac-tion (ISSA) algorithm follows.

Algorithm 1 (ISSA [14]) Iterative State-Space Ab-straction

1. Abstraction: Construct an approximated network of the original network by aggregating states and re-constructing CPTs.

2. Inference: Evaluate the approximated network to obtain approximations of interest.

3. Termination?: Check whether the algorithm should stop, using application-dependent criteria. If yes, return the current solution. Otherwise, go to the next step. The algorithm will stop when there is no superstate in the current network.

4. Refinement: Select which superstate should be split, return to step 1.

To evaluate a given Bayesian network approxi-mately, the state-space abstraction methods construct a very coarse version of the original network by ag-gregating consecutive states of some random variables [14]. Let us call these aggregated states superstates henceforth. Since every random variable in a Bayesian network has an associated conditional probability table (CPT) that contains its probability distributions given its parents’ states, the state-space abstraction procedure

Since we represent a random variable with a oval node in Bayesian networks, we will use nodes and random variables exchangeably hence-forth.

(3)

must also construct CPTs for all affected random vari-ables. Specifically, if a superstate is introduced into ei-ther the state space of a random variable or the state spaces of the parents of a random variable, then we need to reconstruct the CPT of this affected random variable.

As a result, we need to compute the probability val-ues for the new CPTs from the probability valval-ues con-tained in the original CPTs. LetP(A) andC(A)

respec-tively denote the parents and children of a nodeA,a i

thei th

possible state ofA, and[a i;j

]the superstate that

is the aggregation of statesa

i through a jinclusively. If we replacea i through a j by [a i;j

], then we need to

de-termine the values in the reconstructed CPTs ofAand

C(A). Namely, we need to determine ^ Pr([a

i;j ]jp(A))

for every possible value p(A) of P(A), where ^ Pr()

represents an approximated probability. The CPT of ev-ery child T of A must be reconstructed as well. Let

PX(T)=P(T)nfAgdenote the parents ofT

exclud-ingA. We need to determine the conditional probabilities ^

Pr(t k

jpx(T);[a i;j

])for allkandpx(T). By

appropri-ately choosing formula for this probability reassignment task, we control whether we compute the point-value ap-proximations [14] or bounds of the desired probability distributions [12].

Using the following formula in ISSA will give us bounds of distributions of T when A positively

influ-encesT. ^ Pr([a i;j ]jp(A))= j X l=i Pr(a l jp(A)) (3) ^ F(t k jpx(T);[a i;j ])= max l2[i;j] F(t k jpx(T);a l ) (4)

(3) is an intuitive assignment, where the probability of a superstate is the sum of the probability of its components. The formula for ^

Pr(t k

jpx(T);[a i;j

]), for allk, is more

complex, and we assign them such that (4) holds, where

F(t k jpx(T);a l )Pr(tt k jpx(T);a l )represents the

conditional, cumulative distribution ofT.

After constructing an approximated network, we may employ any exact evaluation algorithm to compute the probability distributions of interest. When random variables positively or negatively influence one another, we can prove that we will obtain bounds of probability distributions if we apply (3) and (4) in the abstraction step [12].

For some applications, we may want to refine the approximated network for better solutions after obtaining the current approximations. We may achieve this goal by splitting superstates in the approximated network, and construct another approximated network for evaluation.

Assume that the state space of an abstracted nodey

contains more than one superstate. We will need to choose which superstate to split. An intuitive strategy is to split the superstate that has the largest approximate, marginal probability for every abstracted node. This so-called most-probable-superstate (MPSS) heuristic led to satisfactory results in some experiments [14]. However, selecting the “best” superstate to split for the new

ap-y

Any node whose states are aggregated is an abstracted node.

proximated network is not an easy problem, and inter-ested readers are referred to [14] for further details.

3. Relaxing stochastic dominance

In previous work, Liu and Wellman report that we can apply the ISSA algorithm to compute bounds of proba-bility distributions when random variables positively or negatively influence others [12]. Such an algorithm al-lows us to explore complex networks that would render exact computation of probability distributions impracti-cal.

The main purpose of requiring the positive/negative influence relationship between random variables is that we can aggregate states freely and obtain bounds of the exact distributions. The following derivation shows the core basis for computing bounds of travel times via state-space abstraction. Let = L 1 ! L 2 ! ! L n be a traveling path, andPr(t i

)be the probability of arriving at location L

i at time T

i = t

i. Thus, given a departure time from L

1, say t

1, we can infer the distribution of the arrival

time atL 2 easily, and it is F(t 2 j t 1 ). As we expand the

partial path in a search algorithm, we compute the CDF

F(t j+1

)of arrival time atL

j+1based on the arrival time

atL j: z F(t j+1 )= X tj F(t j+1 jt j )Pr(t j ); (5) wherePr(t j

)is actually a shorthand forPr(t j j t 1 ). The conditioning on t

1 will not be shown explicitly for

no-tational simplicity henceforth. One way to control the time for computing the distributions of arrival times is to confine the growth of the number of states ofL

j [13].

We can achieve this by aggregating the state ofL j

be-fore we compute the distribution ofL

j+1. Therefore, in

general, we would still like to abstract the state space of

T

jin computing the distribution of T

j+1after obtaining ^

F(t j

). We apply (3) and (4) as follows.

^ ^ Pr( ^ t j ) = X t j 2 ^ t j ^ Pr(t j ) (6) ^ F(t j+1 j ^ t j ) = max t j 2 ^ t j F(t j+1 jt j ) (7) We can let ^ Pr(t 2 ) Pr(t 2

)without any loss, although

we have obtained the exact distribution forT

2 already.

Hence we can apply (6) and (7) toT jfor all

j2. In (6),

the double “hats” imply that the approximate probabili-ties are determined based on other approximated proba-bilities. For simplicity, single “hat” rather than double “hats” will be used to denote any approximate probabili-ties when there is no risk of confusion. Also we use the “hat” symbol overt

j to denote that the state space of T

j

is aggregated when we compute an approximate distribu-tion ofT

j+1. z

For simplicity, we assume that one would not stop at intermediate locations, so there is really no need to distinguish arrival and departure time for intermediate locations. As a result, we will use arrival time for both.

(4)

Using (6) and (7), we showF(t j+1 )FSD ^ F(t j+1 ) as follows. ^ F(t j+1 ) (8) = X ^ t j ^ F(t j+1 j ^ t j ) ^ ^ Pr( ^ t j ) = X ^ t j [[ max tj2 ^ tj F(t j+1 jt j )] X t j 2 ^ t j ^ Pr(t j )] X t j F(t j+1 jt j ) ^ Pr(t j )= X t j F(t j+1 jt j )d ^ F(t j ) X tj F(t j+1 jt j )dF(t j )=F(t j+1 ) Since everyt

j is covered by exactly one ^ t j when we aggregate states, ^ Pr(t j

)will occur exactly once

af-ter we completely expand the summations in the second equality. Also each component ^

Pr(t j )of ^ ^ Pr( ^ t j )is multi-plied bymax tj2 ^ tj F(t j+1 jt j

)which must be larger than F(t j+1 jt j )for allt jcovered by ^ t

j, so we obtain the first

inequality in (8) after recollecting all terms. Now, as we have assumed thatT

jpositively influences T j+1, we have F(t j+1 jt j )FSDF(t j+1 jt 0 j )ift j t 0 j . In other words, F(t j+1 jt j )is a non-increasing function oft j. Also recall that atL 2, ^ F(t 2

)is actually an exact distribution, so it

is trivially true thatF(t 2 )FSD ^ F(t 2 ). Using proof by

induction, we can assume thatF(t j )FSD ^ F(t j ), and go on to show thatF(t j+1 )FSD ^ F(t j+1

). Now given that F(t j )FSD ^ F(t j )and thatF(t j+1 jt j )is a non-increasing function oft

j, we can apply (2) to obtain the second

in-equality after a simple algebraic manipulation, and estab-lishF(t j+1 )FSD ^ F(t j+1 ).

For networks that have more complex structure than the linear, the proof is similar but more involved. In

short, when a node in the Bayesian network positively or negatively influences its child, we can apply the state-space abstraction methods to compute bounds of proba-bility distributions. Moreover, the SSA methods will give us bounds as long as we aggregate consecutive states.

What do positive and negative influences really im-ply in reality? LetXandY in Figure 2 be the departure

time and arrival time of a trip, respectively. An inter-pretation of positive influence in transportation networks is that departing from the origin later will not increase the probability of arriving at the destination at an earlier time. This assumption seems reasonable, and arguably holds in real world applications. Nevertheless, it is an assumption at best.

When the relationships of positive or negative in-fluence do not hold, the second inequality in (8) will not hold because we lose the condition thatF(t

j+1 jt

j )is a

non-increasing function oft

j. This, in turn, destroys the

applicability of SSA methods.

However, if the assumption of positive or nega-tive influence is slightly violated, we can still apply the SSA methods, with some modifications, to find bounds of probability distributions. Consider the distributions shown in Figure 3, wheret

jkrepresents the k

th

state of

T

j. The crossing curves show that T

jdoes not positively

influenceT . However, the trend of the curves seems

W_M W_M W_M W_M WM WM WM WM WM )W_M_W_M Figure 3.T

j weakly positively influences T

j+1.

to support thatT

jweakly positively influences T j+1. In particular, bothF(t j+1 jt j3 )andF(t j+1 jt j4 )first order dominateF(t j+1 jt j1 )andF(t j+1 jt j2 ). A formal defini-tion follows.

Definition 2 Assume that a random variableX hasm states:x

1 ;x

2 ;;x

mand that these states form n1 groups:G 1 =fx 1 ;;x b1 g,G 2 =fx b1+1 ;;x b2 g, , and G n = fx bn 1+1 ;;x m g. A node X

weakly positively influences its child Y if and only if,

F(yjx i

;px(Y))FSDF(yjx k

;px(Y)), for allx i 2G j, x k 2 G l, and

px(Y), wherej >landpx(Y)denotes value of other parentsPX(Y)ofY.

WhenX weakly positively influences its childY,

we can apply the SSA methods to obtain bounds of the distributions of Y. Let g(x

i

) be the state group

that contains x

i. To compute the bounds, we

approx-imate the conditional cumulative distribution functions

F(yjx i

;px(Y))for allx

iby the following formula

be-fore computing the approximations with Algorithm 1.

^ F(yjx i ;px(Y))= max x j 2g(x i ) F(yjx j ;px(Y)) (9)

Consider the example shown in Figure 3. After we apply (9) to the distributions of F(t

j+1 jt j ), both F(t j+1 jt j1 )andF(t j+1 jt j2

)are set to the values of the

upper, thick curve, whileF(t j+1 jt j3 )andF(t j+1 jt j4 )to

the lower, thick curve in Figure 4.

W_M W_M WM WM WM WM W_M W_M W_M )W_M_W_M

Figure 4. Using weakly positive influence for bounding distributions

We can prove that the approximate CDF computed with these approximations first order dominates the exact CDF. ^ F(t j+1 ) = X t j ( max t j 2g(t j ) F(t j+1 jt j ))Pr(t j ) X tj F(t j+1 jt j )Pr(t j )=F(t j+1 )

After we apply (9) to the conditional CDFs, the resulting approximate CDFs make the involved random variables assume the positive influence relationship. As a result, we can apply the SSA methods to compute bounds of the already approximated distributions using even less num-ber of states. Due to transitivity, the new bounds are also bounds of the exact distributions.

(5)

When the FSD relationship in Definition 2

re-verses, we say that X weakly negatively influencesY.

Under such circumstances, we replace themaxoperator

in (9) by theminoperator to obtain bounds of probability

distributions analogously.

4. Superstate selection strategies

At step 4 of Algorithm 1, we have to split selected su-perstates for improving the quality of approximations. As we split the superstates, we recover some distinction among the original states, and expect the results of eval-uating Bayesian networks to improve. Liu and Wellman report theoretic and experimental analysis for this super-state selection problem for Bayesian networks [14].

As we discuss in the previous section, the superstate selection problem for computing the fastest path is not the same as that for evaluating Bayesian networks. As we gradually expand partial paths=L

1

!L

2

!! L

n to the next intermediate location L

n+1, we have an

approximate probability distribution forT

n already. In

order to confine the growth of the state spaces of arrival times for intermediate locations, we aggregate the states ofT

nbefore computing the CDF of T

n+1. The problem

is how we group states into a set of superstates, not se-lecting superstates for splitting. In the previous work on fastest path planning problems, this superstate selection problem was unaddressed [13].

Liu and Wellman assume that the departure time positively influences the arrival time for any trip in ap-plying the SSA methods for computing bounds of travel times. Figure 2 shows such an example, lettingX andY

be the departure and arrival times, respectively. For com-puting the travel times of a path, they assume that the

departure time fromL

j positively influences the arrival

time atL j+1. Namely, F(t j+1 jt 0 j )F(t j+1 jt j )for all t j+1and t j >t 0 j . Let t jk be the k th state of T j. We make the

assumption more specific by assuming that the val-ues of F(t

j+1 jt

jk

) will not deviate from those of F(t

j+1 jt

j(k +1)

)significantly. Namely, for a small"and

for allkandt j+1 ;we assume F(t j+1 jt jk ) F(t j+1 jt j(k +1) )": (10)

This assumption should hold for transportation networks, as we typically do not expect normal traffic conditions to change drastically within a short time period. The as-sumed inequality deviates from reality whent

j+1 is

ex-tremely small or large. F(t j+1

jt jk

)will be0and1,

re-spectively, for allt

jk, and the differences should be 0.

Nevertheless, the inequality still holds.

Now, although we are computing bounds for the de-sired distributions, we would like to make the bounds as close to the actual distributions as possible. As-sume that T j has m states: t j1 ;t j2 ; ;t jm and

that we aggregate these states into n groups: S 1 = ft j1 ;;t jb1 g, S 2 = ft j(b 1 +1) ;;t jb2 g, , and S n = ft j(b n 1 +1) ;;t jm g. Let g 0 (t jk ) denote the

group that containst

jk. To minimize the errors,

refer-ring to (8) and its derivation, we would like to minimize

the following difference.

Æ = X ^ t j ^ F(t j+1 j ^ t j ) ^ ^ Pr( ^ t j ) X t j F(t j+1 jt j ) ^ Pr(t j ) = X ^ t j [[ max tj2 ^ tj F(t j+1 jt j )] X t j 2 ^ t j ^ Pr(t j )] (11) X t j F(t j+1 jt j ) ^ Pr(t j ) = X tj [[F(t j+1 j min t k 2g 0 (tj) t k ) F(t j+1 jt j )] ^ Pr(t j )] [(0+" ^ Pr(t j2 )++(b 1 1)" ^ Pr(t jb 1 ))]+ [(0+" ^ Pr(t j(b 1 +2) )+ +(b 2 b 1 1)" ^ Pr(t jb 2 ))]+ +[(0+" ^ Pr(t j(b n 1 +2) )+ +(b n b n 1 1)" ^ Pr(t jm ))]

The first two equalities follow directly from (8). The third equality also follows from (8), adding that

max t j 2t 0 j F(t j+1 jt j )is equal to the CDF ofT j+1 given the smallestt kin g 0 (t j )becauseT jpositively influences T

j+1. Applying (10) will give us the inequality in (11),

where the zeros result from the fact that in eachS i, one

and only one CDF will subtract itself.

The right hand side of the inequality in (11) gives us an upper bound of the differenceÆ. Therefore, one

way to minimizeÆis to minimize the upper bound. Let

b 0

=0andb n

=m. When (10) holds, a heuristic for

de-termining how we aggregate the states ofT jinto

ngroup

is to minimize the following quantity.

n 1 X k =0 b k +1 X i=b k +1 (i b k 1) ^ Pr(t ji )

Notice that the contribution of eacht jk is

^ Pr(t

jk )

multiplied by a weighting factor that is determined by the location oft jk in its group g 0 (t jk ). In contrast, the

MPSS heuristic, that was proposed for general Bayesian networks and discussed in Section 2.2, leads us to use

P tj2t 0 j ^ Pr(t j

)as the guidance for superstate selection.

5. Applications

The techniques presented in this paper should be help-ful for any applications that need to compute probability distributions of random variables. Random variables can represent travel times in path planning problems, job pro-cessing times in task planning problems, etc. One may also apply the methods to resolving tradeoffs in QPNs [11, 19]. Due to space limitation, we provide only the outline of an application to path planning problems be-low.

Hall shows that we cannot directly apply the prin-ciple of dynamic programming to path planning in net-works with time-dependent arc weights [5]. Kaufman and Smith extend the applicability of the Dijkstra’s al-gorithm to transportation networks with time-dependent travel costs by introducing the concept of consistent link travel times [9]. Wellman et al. generalize the concept to

(6)

stochastic consistency in stochastic networks [22]. Liu

and Wellman apply the SSA methods to tackle path plan-ning problems when we have large stochastic networks [13]. In both [22] and [13], strict stochastic dominance relationships must hold among related distributions of travel times, which, as discussed in [9], is a good approx-imation to the reality at best. The concepts of weakly

positive/negative influences in Definition 2 extends the

applicability of algorithms proposed in the previous ex-tensions.

Assume that we are given a transportation network in which the departure times weakly positively influences the arrival times as shown in Figure 3. As discussed in Section 3., we can create an approximation of the trans-portation network by formula (9) so that we can apply the path planning algorithm proposed in [13]. Also, the tech-nique discussed in Section 4. provides context-specific guidelines for improving qualities of bounds for the path planning problems, and we can implement the guidelines in [13] as well.

6. Conclusions

Assuming the positive and negative influence relation-ships among random variables, we can apply the state-space abstraction methods to computing bounds of prob-ability distributions [12, 14]. Applications of such bounds include inferring qualitative relationships in qual-itative probabilistic networks [11] and searching fastest paths in stochastic transportation systems [13].

However, the assumption of positive and negative influences limits the applicability of the existing meth-ods. This paper defines the concepts of weakly positive

influence and weakly negative influence among random

variables. When random variables have such relation-ships, the state-space abstraction methods remain appli-cable after a few revisions. Also we show that the super-state selection strategies proposed for general Bayesian networks may not work well for special domains, and find a better heuristic designed for the fastest path plan-ning problems.

References

[1] Cooper, G. F., The computational complexity of proba-bilistic inference using Bayesian belief networks, Artifi-cial Intelligence 42, 393–405, 1990.

[2] Dagum, P. and Luby, M., An optimal approximation al-gorithm for Bayesian inference, Artificial Intelligence 93, 1–27, 1997.

[3] D’Ambrosio, B., Inference in Bayesian networks, AI Magazine 20(2), 21–36, Summer 1999.

[4] Draper, D. L. and Hanks, S., Localized partial evaluation of belief networks, Proceedings of the 10th Conference on Uncertainty in Artificial Intelligence, 170–177, 1994. [5] Hall, R. W., The fastest path through a network with ran-dom time-dependent travel times, Transportation Science 20(3), 182–188, 1986.

[6] Heckerman, D. E., Mamdani, A., and Wellman, M. P., Real-world applications of Bayesian networks, Commu-nications of the ACM 38(3), 24–26, 1995.

[7] Horvitz, E., Suermondt, H. J., and Cooper, G. F., Bounded conditioning: Flexible inference for decisions under scarce resources, Proceedings of the 5th Workshop on Uncertainty in Artificial Intelligence, 182–193, 1989. [8] Jensen, F. V., An Introduction to Bayesian Networks,

Springer-Verlag, New York, 1996.

[9] Kaufman, D. E. and Smith, R. L., Fastest paths in time-dependent networks for intelligent vehicle-highway sys-tems applications, IVHS Journal 1(1), 1–11, 1993. [10] Koller, D., Lerner, U., and Angelov, D., A general

al-gorithm for approximate inference and its application to hybrid Bayes nets, Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence, 324–333, 1999. [11] Liu, C.-L. and Wellman, M. P., Incremental tradeoff res-olution in qualitative probabilistic networks, Proceedings of the 14th Conference on Uncertainty in Artificial Intel-ligence, 338–345, 1998.

[12] Liu, C.-L. and Wellman, M. P., Using qualitative relation-ships for bounding probability distributions, Proceedings of the 14th Conference on Uncertainty in Artificial Intel-ligence, 346–353, 1998.

[13] Liu, C.-L. and Wellman, M. P., Using stochastic-dominance relationships for bounding travel times in stochastic networks, Proceedings of the 2nd Interna-tional IEEE Conference on Intelligent Transportation Systems, 55–60, 1999.

[14] Liu, C.-L. and Wellman, M. P., Evaluation of Bayesian networks with flexible state-space abstraction methods, International Journal of Approximate Reasoning 30(1), 1–39, 2002.

[15] Minka, T. P., Expectation propagation for approximate Bayesian inference, Proceedings of the 17th Conference on Uncertainty in Artificial Intelligence, 362–369, 2001. [16] Nicholson, A. E. and Jitnah, N., Using mutual

informa-tion to determine relevance in Bayesian networks, Pro-ceedings of the 5th Pacific Rim International Conference on Artificial Intelligence, 399–410, 1998.

[17] Pearl, J., Probabilistic Reasoning in Intelligent Sys-tems: Networks of Plausible Inference, Morgan Kauf-mann Publishers, Inc., 1988.

[18] Poole, D., Average-case analysis of a search algo-rithm for estimating prior and posterior probabilities in Bayesian networks with extreme probabilities, Proceed-ings of the 13th International Joint Conference on Artifi-cial Intelligence, 606–612, 1993.

[19] Renooij, S. and van der Gaag, L. C., Enhancing QPNs for trade-off resolution, Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence, 559–566, 1999. [20] van Engelen, R. A., Approximating Bayesian belief

net-works by arc removal, IEEE Transactions on Pattern Analysis and Machine Intelligence 19(8), 916–920, 1997. [21] Wellman, M. P., Fundamental concepts of qualitative probabilistic networks, Artificial Intelligence 44, 257– 303, 1990.

[22] Wellman, M. P., Ford, M., and Larson, K., Path plan-ning under time-dependent uncertainty, Proceedings of the 11th Conference on Uncertainty in Artificial Intelli-gence, 532–539, 1995.

[23] Whitmore, G. A. and Findlay, M. C., (Eds.), Stochas-tic Dominance: An Approach to Decision Making Under Risk, D. C. Heath and Company, 1978.