Complexity of KLA-Algorithm - 大型貝氏網路推論之時間與準度權衡演算法

Since this algorithm adopts the conditioning method, the complexity of the method is O(N · e^p), where p is the largest size of the local loop cut-sets. However, if we can re-duce p by the approximate methods, the complexity will show a substantial reduction. The complexity of computing conditional probabilities by adding the virtual node is O(e^p), and can be reduced by the approximate method. The forward-backward method for computing conditional probabilities is O(N · e^p). Therefore, the size of the local loop cut-sets plays an

important role in determining the complexity of the KLA-algorithm. Although we can reduce it by using the approximate methods, some types of graphs will still have an intractable com-puting time. In the following paragraphs, we discuss the time performance of the algorithm in different types of graph structures.

• Poly-tree structure (no loop)

In this structure, the time performance of the KLA-algorithm is the same as that of the junction tree algorithm. Since there is no loop, we just propagate message to all the nodes.

The computing complexity of both algorithm is O(N ). The structure is shown in Figure 4.9(a).

• Multiply networks with few loops

In this case, the complexity of the graph is not large, and the size of clique is small. The junction tree algorithm has a better performance than the KLA-algorithm with respect to the computing time complexity because of trade-of between memory space and computation time [10]. This trade-off is obtained if we aggregate some small cliques to be a large clique; we would need more memory space to store the cliques, but the number of cliques will decrease and result in a saving of the propagation time. The computation time of the propagation of large cliques also grows exponentially, but it is not the dominant computation time until the size of the clique is more than 1 GB.

However, since the graph is simple, the computation time of both methods is tractable.

The structure is shown in Figure 4.9(b).

• Multiply networks with many long loops

In this structure, the conditioning method will be broken since the large size of loop cut-sets, As in the case of the junction tree, the large cliques contain not only the parents of the node but also other nodes in the loops. However, in our approximate method, we can reduce the size of the local loop cut-sets and apparently have the same result as that of an exact inference. The structure is shown in Figure 4.9(c).

Figure 4.9: (a) Poly-tree structure. (b) Multiply networks with few loops. (c) Multiply networks with many long loops. (d) Multiply networks with many short loops.

• Multiply networks with many short loops

This structure will lead to intractable computing time for all the exact methods. Since all the loops are short, we cannot reduce the local loop cut-sets by the KLA-algorithm if the precision is concerned. Therefore, we have to use other methods to solve this problem. The structure is shown in Figure 4.9(d).

Chapter 5 Experiments

In order to verify the KLA-algorithm in a different graph structure, first, we design a series experiment to compare with the junction tree algorithm and discuss the performance of precision and computation time. Second, we apply the KLA-algorithm to real-world data and ozone level detection in order to carry out some simulations.

5.1 Verification of KLA-Algorithm

We build seven different structures with a different number of nodes, namely, 15, 30, 45, 60, 75, 90, and 105 nodes with randomly connected arcs and the maximum number of parents is four for the propagation test. By comparing the junction tree algorithm, we discuss the memory, precision, and computation time of both the algorithms. For the given evidence, we compare two computing conditional probability approaches in the KLA-algorithm. The following is the result of the simulation.

As shown in Figure 5.1, the structure with different number of nodes reflects different complexities. In the last experiment, we will use graph complexity for the discussion. Notice that the correlation between the complexity and the number of nodes is not always positive, see Section 4.4.1. Figure 5.1 points out the seven different structures and the corresponding complexity.

10 20 30 40 50 60 70 80 90 100 110 3.5

4 4.5 5 5.5 6 6.5

0WODGTQHPQFGU

)TCRJEQORNGZKV[

Figure 5.1: Number of nodes vs. graph complexity.

In Figure 5.2, we discuss the memory space in the running algorithm. The considerable space is not only inefficient with respect to computation but also impractical. The clique size is denoted as the number of nodes in the clique. If there are 25 nodes in a clique and each node has two states, the real memory space needs 2²⁵× 32 ' 1GB bits since the float value needs 32 bits to store the data. We can observe that the space of the junction tree algorithm increases exponentially when the graph complexity increases. However, the KLA-algorithm is fixed since the maximum clique of the KLA-algorithm depends on the maximum number of parents. Because the structures consist of four parents, the maximum size of clique is five (four parents and the corresponding node) in the KLA-algorithm.

Since the most important thing in Bayesian networks is the computation of the conditional probabilities for the given the evidence, the system would random choose five evidence nodes.

The KLA-algorithm has two different methods for computing the conditional probabilities, one is the VN-method and the other is the FB-method. We will compare both methods and their approximate ways with a junction tree to see the performance. The followings is the simulation result.

Computation time vs. graph complexity

(VN-method compared with junction tree algorithm)

3.5 4 4.5 5 5.5 6 6.5 5

10 15 20 25 30

)TCRJ%QORNGZKV[ 㪤㪸㫏㫀㫄㫌㫄㩷㪚㫃㫀㫈㫌㪼㩷㫊㫀㫑㪼㩷㩷㩷㩷㩷㩷㩷㩷㩷㩿㪥㫌㫄㪅㩷㫆㪽㩷㫅㫆㪻㪼㫊㩷㫀㫅㩷㪸㩷㪺㫃㫀㫈㫌㪼㪀㩷

JT Algorithm New Algorithm

Figure 5.2: Graph complexity vs. maximum clique size. Size of the clique is represented by the number of nodes in the clique. The maximum clique size grows exponentially in junction tree algorithm but fixed in the KLA-algorithm when the graph complexity increases.

The computation time is defined only on the basis of the inference of the conditional probabilities. Thus, we do not care the time of the system construction. The number of levels is used for reducing the local loop cut-sets. For example, five levels implies that we keep the local loops in which the shortest path is less than five levels for the corresponding node. Hence, the fewer the levels, the smaller is the size of the local loop cut-set and the lower is the computation time. In Figure 5.3, we can see the junction tree always has a low computation time in the case of the networks with a complexity of less than six. However, when the complexity is more than six, the computation time increases exponentially and requires a large memory space. The VN-method has intractable computation time in a complex structure (green line). However, by using approximate methods, we can reduce the computation time to increase at a slow rate when the graph becomes more complex. When there are less than three levels, we obtain the result in 10s in these seven different structures.

The computation time of five levels (red line) and four levels (blue line) decreases dra-matically in the case of the most complicated structure. This is because the structure of

(Graph complexity) G1(3.7505) G2(5.1459) G3(5.2319) G4(5.8076)

Junction Tree 0.047 0.0531s 0.0468 0.1906

KLA(VN) (all level) 0.5266 21.531 366.812 161.563 KLA(VN) (5-level) 0.5797 2.8032 23.0363 80.156

KLA(VN) (4-level) 0.5125 0.3718 6.109 9.532

KLA(VN) (3-level) 0.3141 0.5874 0.3626 0.766

KLA(VN) (2-level) 0.1156 0.1406 0.1874 0.297

KLA(VN) (1-level) 0.0938 0.0842 0.1812 0.297

(Graph complexity) G5(5.8854) G6(6.2131) G7(6.2464)

Junction Tree 0.4718 6.1968 Out of memory

KLA(VN) (all level) Intractable time Intractable time Intractable time

KLA(VN) (5-level) 2.21E+03 3.51E+03 860

KLA(VN) (4-level) 143.562 221.906 9.078

KLA(VN) (3-level) 0.844 1.625 2.031

KLA(VN) (2-level) 0.297 0.563 0.719

KLA(VN) (1-level) 0.281 0.594 0.438

Table 5.1: Computation time (seconds) of VN-method with different numbers of level ap-proximations and junction tree algorithm. The corresponding graph is shown in Figure 5.3.

the most complex graph (105-node structure) has more long loops than in the case of the previous complexity (90-node structure). Therefore, when we consider only five levels for calculation, the 105-node structure will have fewer local loop cut-sets than 90-node structure and will spend less time on computation.

Precision vs. graph complexity

(VN-method compared with junction tree algorithm)

Because the VN-method is an oriented-method, which calculates only the probabilities of the interesting nodes. In this simulation, we focus on the last-node conditional probability and compare it with the exact value calculated by the junction tree algorithm. Further, since the junction tree algorithm cannot work in a 105-node structure, the exact value of the structure is estimated by the five-level approximation. We use the K-L divergence to present the difference between the exact value and the approximate value. The K-L divergence is defined to be

D_KL(P |Q) =X

P (i)logP (i)

Q(i), (5.1)

3.5 4 4.5 5 5.5 6 6.5

Figure 5.3: Computation time (seconds) of VN-method with different numbers of level ap-proximations and junction tree algorithm. The computation time is represented in a loga-rithmic form. Junction tree algorithm lacks one point because the algorithm can not work in the most complex structure (The memory is not sufficient). The VN-method with all levels only has 4 points because the computation time is intractable in the last three structures.

and is only defined when P > 0 and Q > 0 for all values of i, and when the sum of P and Q both is 1. Typically, P represents the exact distribution of the data. The measure Q typically represents an approximation of P . When the value of K-L divergence is smaller than 10⁻², we define the approximate value to be close to exact value and a good approximation.

In Figure 5.4, keep all levels would obtain the exact value; however, in the case of a large complexity the computation time is intractable. Therefore, the line of all levels (blue line) only has four values. The value of the K-L divergence will increase when the number of levels decreases. For example, the structure with 60 nodes with a complexity value of 5.8076 has a small divergence in the case of five-level approximation. However, in the case of 4-level approximation, the K-L divergence increases dramatically, which means that there is an influential top nodes in some 5-level loops, but we abnegate them. Overall, we can always obtain a good approximate value by keeping only some levels.

Computation time vs. graph complexity

(Graph complexity) G1(3.7505) G2(5.1459) G3(5.2319) G4(5.8076) KLA(VN) (all level) 7.32E-09 1.98E-09 9.43E-10 3.87E-09

KLA(VN) (5-level) 7.32E-09 9.08E-05 0.0813 3.87E-09 KLA(VN) (4-level) 7.32E-09 9.08E-05 0.0034 0.0133 KLA(VN) (3-level) 2.66E-04 7.62E-05 0.003 0.02 KLA(VN) (2-level) 0.0015 3.06E-04 3.04E-04 0.1506 KLA(VN) (1-level) 0.0015 1.74E-04 0.0022 0.1788

(Graph complexity) G5(5.8854) G6(6.2131) G7(6.2464)

KLA(VN) (all level) NaN NaN NaN

KLA(VN) (5-level) 5.08E-05 5.62E-05 0(Basis) KLA(VN) (4-level) 1.99E-08 6.08E-05 2.01E-06 KLA(VN) (3-level) 0.0013 1.51E-05 0.00E+00 KLA(VN) (2-level) 2.55E-04 1.29E-05 7.25E-07 KLA(VN) (1-level) 2.55E-04 0.0016 9.87E-07

Table 5.2: K-L divergence between approximate value and exact value. NaN represents that we do not have approximate value. The corresponding graph is shown in Figure 5.4.

3.5 4 4.5 5 5.5 6 6.5

Figure 5.4: K-L divergence between approximate value and exact value. There is no clear relation between KL-divergence value and graph complexity. Most K-L divergence values are less than 10⁻², which means that we can obtain a good approximation by the VN-method of the KLA-algorithm.

(FB-method compared with junction tree algorithm)

The FB-method is the combination of conditional method (forward propagation) and loopy belief propagation (backward propagation). Thus, the computation time is the sum of the computation time in both propagation methods. The computation time of forward propagation depends on the size of the local loop cut-sets. Hence, we can save time by keeping only small level loops. The computation time of backward propagation depends on how long the potential will converge. That is, we can modify the system, and the potential value will have an effect on the convergence time. Therefore, the entire computation time will not always decrease in the case of small levels.

In Figure 5.5, except the junction tree that grows exponentially, the others grow slowly.

However, the computation time of the structure with 105 nodes (most complexity) increasing drastically, especially the 1-level approximation. This is because the system is difficult to converge. The most stable approximate level is 2 and can obtain the result in a few seconds in this figure. Notice that a high-complexity graph is not always difficult to converge. It depends on the structure and the parameter value. All in all, the computation time of the FB-method always grows slowly in the small levels approximation and thus can handle the most complex system well.

Precision vs. graph complexity

(FB-method compared with junction tree algorithm)

Unlike the VN-method, the FB-method can update all the conditional probabilities in the networks at once , and not just of the interesting node. However, we still just look at the final node to compare with the VN-method. The exact value is still obtained from the junction tree algorithm, and the exact value of the most complexity structure is calculated by the VN-method when five levels are kept. In Figure 5.6, we can see that all the K-L divergence values are less than 10⁻², and the value of the K-L divergence in different level approximations is very similar. That means that the difference in the levels has little effect on the precision of the FB-method. This is because in backward propagation, the loopy

(Graph complexity) G1(3.7505) G2(5.1459) G3(5.2319) G4(5.8076)

Junction Tree 0.047 0.0531s 0.0468 0.1906

KLA(FB) (7-level) 3.4167 220.073 215.5937 1.00E+03

KLA(FB) (5-level) 3.2397 44.9533 65.266 173

KLA(FB) (4-level) 2.6407 15.0363 25.625 32.891

KLA(FB) (3-level) 2.1877 6.9633 8.578 14.313

KLA(FB) (2-level) 1.8333 4.0937 5.547 8.578

KLA(FB) (1-level) 1.4113 3.1613 5.328 7.968

(Graph complexity) G5(5.8854) G6(6.2131) G7(6.2464) Junction Tree 0.4718 6.1968 Out of memory KLA(FB) (7-level) 1.44E+03 4.04E+03 2.05E+04

Table 5.3: Computation time (seconds) of FB-method with different numbers of level ap-proximations and junction tree algorithm. The corresponding graph is shown in Figure 5.5.

3.5 4 4.5 5 5.5 6 6.5

Figure 5.5: Computation time (seconds) of FB-method with different numbers of level approx-imations and junction tree algorithm. The computation time is represented in a logarithmic form. The FB-method when all levels are kept is replaced by that when seven levels are kept here, since the computation time of keeping all values is always intractable.

(Graph complexity) G1(3.7505) G2(5.1459) G3(5.2319) G4(5.8076) KLA(FB) (7 level) 9.47E-04 4.22E-04 6.95E-04 0.0048 KLA(FB) (5 level) 9.47E-04 3.32E-04 0.001 0.0048 KLA(FB) (4 level) 1.50E-03 3.44E-04 3.79E-04 0.0048 KLA(FB) (3 level) 4.49E-04 1.87E-04 3.15E-05 0.0045 KLA(FB) (2 level) 4.41E-04 1.50E-04 0.0017 0.0054 KLA(FB) (1 level) 0.0038 1.03E-06 0.0098 0.0089

(Graph complexity) G5(5.8854) G6(6.2131) G7(6.2464) KLA(FB) (7 level) 1.26E-05 6.56E-05 7.00E-05 KLA(FB) (5 level) 1.38E-05 7.05E-05 3.09E-04 KLA(FB) (4 level) 3.17E-04 3.13E-04 9.03E-05 KLA(FB) (3 level) 3.35E-04 1.90E-04 1.89E-04 KLA(FB) (2 level) 9.18E-05 0.0012 5.82E-06 KLA(FB) (1 level) 7.39E-05 2.46E-04 5.16E-06

Table 5.4: K-L divergence between approximate value and exact value in FB-method. NaN represents that we do not have approximate value. The corresponding graph is shown in Figure 5.6.

belief propagation attempts to find the convergence value of all clique potentials. Thus, different approximate values in forward propagation are just the different starting points in backward propagation and cause different ways to converge. In other words, the different level approximations would change the convergence way and the convergence time, but the convergence value is caused by the system or some other factor.

Compared to Figure 5.4, the K-L divergence in the VN-method is smaller than the value in the FB-method at some points. However, both of them have good approximation in these seven structures. By Comparing the computation time of both methods, we find that the VN-method needs a smaller computation time than the FB-method. Thus, irrespective of the precision or computation time, the VN-method has a better performance than the others.

Why do we need the FB-method? First, it can update all of conditional probabilities at once. When we want to know the conditional probability of other nodes, we do not need to calculate them again. Second, the VN-method would have intractable computation time when ga considerable amount of evidence is given, since the hidden node would have many parents and might cause many short loops. The following is the simulation when we change

3.5 4 4.5 5 5.5 6 6.5

Figure 5.6: K-L divergence between approximate value and exact value in FB-method. There is no clear relation between KL-divergence value and graph complexity, and the K-L diver-gence in different level approximations are similar. All of the K-L diverdiver-gence value are less than 10⁻², which means that we can obtain a good approximation by FB-method of KLA-algorithm.

the number of evidence nodes in the structure.

Different number of evidence nodes vs. computation time (FB-method compared with VN-method)

In Figure 5.7, we give different numbers of evidence nodes in the structure with 60 nodes, and use the approximate method on 2, 3 and 4-levels. The circle with a thin line is the result of the VN-method. The line moves vertically when the number of evidence nodes increases.

If the number of evidence nodes keep increasing, the 2 or 3-level approximation might also be intractable. The diamond with a thick line is the result of the FB-method. The lines remains nearly unchanged when the number of evidence nodes are increasing. We can find when the number of evidence nodes is more than 17, the FB-method performs better than the VN-method except in the case of 2-level approximation. Thus, if the system just receives little evidence, we can adopt the VN-method for the calculation. If the number of evidence nodes is large, or we want the observed conditional probabilities of all nodes, then the FB-method

Number of evidence nodes evid = 2 evid = 5 evid=8 evid = 11 evid = 14 evid = 17

KLA(VN) (2 level) 0.312 0.438 0.36 0.515 0.437 0.75

KLA(VN) (3 level) 1.344 1.671 1.735 6.688 9.406 42.672

KLA(VN) (4 level) 10.922 13.36 111.375 117.875 76.859 169.375

KLA(FB) (2 level) 7.453 7.25 7.672 6.125 6.609 7.547

KLA(FB) (3 level) 11.969 11.766 15.281 12.765 13.828 16.047 KLA(FB) (4 level) 23.594 24.734 27.578 27.61 28.578 39.891 Table 5.5: Computational time (sec) of different number of evidence nodes in two different methods of KLA-Algorithm.The corresponding graph is shown in Figure 5.7.

is appropriate.

2 4 6 8 10 12 14 16 18

10^-1 10⁰ 10¹ 10² 10³

&KHHGTGPVPWODGTQHGXKFGPEGPQFGUKPUVTWVWTG)YKVJPQFGU

㪚㫆㫄㫇㫌㫋㪸㫋㫀㫆㫅㪸㫃㩷㫋㫀㫄㪼㩷㩿㪪㪼㪺㪀

2 level(VN-method) 3 level(VN-method) 4 level(VN-method) 2 level(FB-method) 3 level(FB-method) 4 level(FB-method)

Figure 5.7: Computational time of different number of evidence nodes in two different meth-ods. The FB-method is presented as diamond with thick line, and the VN-method is circle with thin line. The FB-method has more stable computing time than VN-method. The y-axis represents the logarithm.

In conclusion, the KLA-algorithm just needs little memory space and can obtain the conditional probability efficiently by the approximate approach. Thus, we would adopt the KLA-algorithm for the analysis of the real-world data.

在文檔中大型貝氏網路推論之時間與準度權衡演算法 (頁 64-78)