3.4 Why New Algorithm?
3.4.2 Some Approaches for Large-Scale Inference
Since both exact methods collapse when applied to large-scale methods, how do we draw inference in large, complex systems? There are three main methods have been proposed for application to large systems, namely, multiply sectioned Bayesian networks [28], noisy
Or-gate models [26], and hybrid inference methods [7]. In the following paragraphs, we will briefly introduce and discuss these methods.
Multiply sectioned Bayesian networks (MSBNs)
A multiply sectioned Bayesian network is an extension of the Bayesian network model for the support of flexible modeling in large and complex problem domains. An MSBN consists of a set of interrelated Bayesian sub-nets, each of which encodes an agent’s knowledge concerning a sub-domain. Global consistency among sub-nets in an MSBN is achieved by communication. Once the information for all the agents is updated, we obtain complete knowledge of the system. For example, in the field of medical science, we can separate the physical structure of the body into different parts, such as brain, respiratory system, and gastrointestinal system. All of them have different but related knowledge domains.
Therefore, we have several different types of doctors, who check their professional parts, and by communicating other domain knowledge to diagnose the disease. Therefore, we do not require to draw an inference in a large-scale system, but only an inference in some small sectioned networks.
The inference method is called a junction forest algorithm (see Figure 3.5). It is similar to a two-layered junction tree. The first layer is in the sectioned networks. For each agent, we construct a little junction tree to draw the inference. The second layer is on an agent.
We view each agent as a supper clique and build a junction tree again to communicate with each other. Since all sectioned networks are small, we can efficiently construct a junction tree and draw an inference. Therefore, we can efficiently handle the large-scale problem.
(a)
Figure 3.5: (a)G is a moral graph of the Bayesian network and is a union graph in (b). (b) G is sectioned into four graphs. (c) Junction forest of G, each square represents a sub-graph in (b) and is called an agent.
Noisy Or-gate model
Recall that Bayesian Networks require the condition probabilities of each variable given all combination of the values of its parents. Therefore, if each variable has only two states and a variable has p parents, we must specify 2p conditional probabilities for that variable.
When p is large, the storage requirements as well as the inference algorithm computations become infeasible.
The idea of noisy Or-gate methods is attempt to avoid specifying every entry in the con-ditional probability table. In other words, we assume each parent causes child to contribute independently. Thus, the probability that parents have an effect on a child is simply the product of the probabilities of the effect of each parent.
As a simple example, medical causal models commonly assume that all the possible causes of a symptom act independently. The person who either tuberculosis (X) or cystic fibrosis (Y ) will have a normal lung X-ray (E). Further, tuberculosis has a failure rate of 70% with respect to showing up on an X-ray, and cystic fibrosis has a corresponding failure rate of 40%. The noisy-OR model states that if someone has both tuberculosis and cystic fibrosis,
the X-ray will have a detection failure rate of 0.4 × 0.7 = 0.28 = 28%. In other words, they combine just like coin tosses. The CPT for this example is given in Table 3.1. The probabilities of having an abnormal lung X-ray (E = 1) are obtained by 1 − P (E = 0).
X Y P (E = 0) P (E = 1)
0 0 1 0
1 0 0.7 0.3
0 1 0.4 0.6
1 1 0.28 0.72
Table 3.1: X-ray noisy-OR example. The chance that an X-ray (E) will fail to detect two medical conditions X and Y is just the product of the individual failure chances.
Since we define the conditional probabilities, on one hand, we do not need to store all the CPT entries and can thus avoid the problem related to limited memory space. On the other hand, we can efficiently draw an inference by using the relationship between the conditional probabilities. Therefore, the noisy Or-gate model is popular in dealing with a large-scale system.
Hybrid inference
A hybrid inference refers to the simultaneous use of an exact and an approximate infer-ence. For example, in the case of the junction tree algorithm, if some cliques are too large to compute the potential, we can use the approximate inference, which refer to the compu-tation of the approximate potential through simulation. The reason that we do not use an approximate inference in all cliques is the consideration of precision. The greater the number of cliques approximated, the lower is the precision obtained.
Since approximate methods, such as Gibbs sampling [11], are being discussed, the com-putational complexity is not exponential with respect to the number of variables. The other advantage is the hybrid inference method can handle relatively large range of distribution and mixture of distribution. Therefore, this method can be applied to a large-scale system.
In addition to these three methods, other approaches have been proposed, such as divorc-ing, which attempts to add a hidden variable to the structure to avoid large junction tree cliques. However, these approaches are suitable only for some special cases and worsen in
general. Therefore, we do not discuss these approaches here.
3.4.3 Why New Algorithm?
Some problems exist in the above-mentioned three methods. In the case of the MSBN approach, there is a restriction with respect to sectioning the graph in which the parents of a node can not be separated. Therefore, if the original networks is so complicated that it will have large cliques, and even after sectioning into small agents, the size of the cliques will not change. The only benefit of sectioning is that we spend less time on triangulation and the construction of the junction tree. Therefore, the MSBN approach appropriate only for some cases.
The noisy Or-gate mode, may be suitable for medical causal models maybe, but in other fields, most of the time, the conditional distribution will not have a relationship in the noisy-Or way. Therefore, we will obtain a poor inference in other field. Therefore, this method is also not suitable for general cases.
The hybrid inference, which is the most general method, can be utilized in a different domain. However, the precision of the inference will be questioned if we do not sample an adequate amount of data when using approximate methods.
Therefore, we do have an approach that can efficiently handle large-scale networks and obtain an exact or a good approximate result for a general case. In the next chapter, we will propose a new approach to a solve the problem of drawing an inference in a large and complex system.
Chapter 4
New Inference Approach
4.1 Introduction
In this chapter, we propose a new algorithm, called KLA-algorithm, for drawing an inference from Bayesian networks. While developing a KLA-algorithm, we take the following two facts into consideration. First, cluster methods (Section 3.3.1) are efficient with respect to the propagation of probabilities but require a considerable amount of memory and time to build the junction tree. In order to reduce the memory space, we look for the minimum clique and then get rid of the junction tree structure. Secondly, conditional methods (Section 3.3.2) compute the probabilities directly on the original network instead of building a cluster tree;
however, only marginal probabilities instead of joint probabilities are attainable. Conditional methods also suffer from a combinatorial explosion while dealing with a large set of cut-set nodes. Therefore, it is expected that the inference algorithm can directly compute the joint probabilities and adopt a local conditioning approach [9] to refine the size of the cut-set nodes. On the basis of the above observations, we have developed a novel algorithm that combines the clique of cluster methods and the structure of conditional methods to avoid the computing inefficiency and the memory space problem and used the concept of a local conditioning approach to decrease the size of the cut-set nodes. In addition, the proposed
algorithm is capable of efficiently computing the marginal, joint, and conditional probabilities for large and complex systems.
The KLA-algorithm can neither be classified as exact inference methods nor approximate inference methods because it allows one to trade-off the quality of approximations with the computational time. In small and simple networks (e.g., poly-trees), the performance of the KLA-algorithm is as good as the other exact inference. In large and complex networks, the KLA-algorithm approximates the probabilities more accurately than other approximate infer-ence engines. The KLA-algorithm does not require any sample data and produces relatively small cliques. Because of these enhancements, the algorithm remarkably extends the range of the realizable network complexity. Furthermore, the joint and conditional probabilities can be easily computed at the same time. Note that traditional inference algorithms can only compute either joint or conditional probabilities.
This chapter is organized as follows: In Section 4.2, we present our algorithm in detail. We present the algorithm structure in Section 4.2.1. The manner to assign a list of conditioning variables to each node and the propagation are considered in Section 4.2.2. The methods of computing joint probabilities and conditional probabilities are given in Sections 4.2.3 and 4.2.4, respectively. Finally, in Section 4.3, we discuss how to trade-off the quality of approximations with the computation time and analysis and compare the complexity with that of the other algorithms.