Inference simulation - Application to Ozone Level Detection Data Set

5.2 Application to Ozone Level Detection Data Set

5.2.3 Inference simulation

In this section, we discuss a series inference performed on the test-data. Then we simulate a prediction procedure from a begin of a day. Some of the data is time-sequenced, and we only can collect some data at first. With difference evidence nodes set, the performance of model would be examined.

Test data result:

Test data has some missing value in difference attributes. The attributes with missing value are usually adjacent, like a section of hours temperature or wind speed resultants. In this situation, the performance of the model is shown in the following tables. We test the

0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

6JGFKHHGTGPVXCNWGQHVJTGUJQNF8G

2TGEKUUKQP

Figure 5.12: The corresponding precision with different value of v_E . The highest precision is 0.76 and v_E is between 0.04 and 0.16.

all levels and 4, 3, 2-level to approximate conditional probabilities by using the FB-method.

Note that we denote “o_” as the observed data, and “p_” as the predicted data. “CE” is defined as cross entropy error.

Table 5.7 shows the result of different level approximations. The all levels is the exact result from junction tree algorithm. We can observe that irrespective of which number of levels approximation, we would obtain the same result, and the performance is very close to exact result. This is because the loopy belief propagation in FB-method has same converge value on different level approximations in this case. Thus, the number of level can be arbitrary chosen.

Prediction on a new day:

From the beginning of a day, we start collecting the data. Every hour we can obtain the new data. However, it is meaningless to know today is ozone day after the day is over. Thus,

All level p_normal p_ozone Total

Table 5.7: Precision and cross entropy error of different level approximation. All level is the exact result.

we will predict hourly and keep modify our prediction after collecting data. In this situation, the inference of Bayesian networks plays an important role. Suppose except hourly data such as T0 ∼ 23 and WSR0 ∼ 23, the other data have been collected from past 24 hours. The following figures shows the probability of ozone day of hourly predictions of random six days from test-data.

In Figure 5.13, the left three graphs is the normal day case, and right three graphs is the ozone day case. In the case of normal day, if the probability of ozone day less than v_E at beginning, then we can obtain a good prediction, since the ozone day probability will keep decreasing. However, if the ozone probabilities near the v_E, it is difficult to predict whether the ozone day or not until at 11 am. This is because the wind speed resultant at 11.am is the only one attributes connected with the target node (ozone or normal day). If we know the information from 11 am, the other information will useless. This is why the probability of the ozone day are same after 11 am. In the case of ozone day, we can find that the probability of ozone day is close to v_E at beginning. The first graph of ozone day case is the miss-classification case. The probability is decreasing and leads to error prediction.

However, in other two graphs, the probability is near threshold v_E all of time, and thus we can have a good prediction. Overall, we will have the most exact prediction after 11 am.

However, in most of time, we can predict whether the ozone day or not before the 11 am correctly. In this simulation, we can conclude that the attributes of wind speed resultant at 11.am is a key factor of ozone level, since the precision is high in ozone detection problem.

5 10 15 20

Figure 5.13: Probability of ozone day on random six days from test-data. The left column is the normal day case and right column is the ozone day case. The dashed line is the threshold v_E.

Chapter 6 Conclusions

In this thesis, we have considered the learning of Bayesian networks, namely structure learn-ing, parameter learning and inference algorithms, and developed the KLA-algorithm for large-scale Bayesian networks. We also presented an application of large-scale networks. In the following paragraphs, we summarize the present exercises and derive several important conclusions.

1. We have presented a BN power constructor algorithm for the construction of large-scale Bayesian networks. The BN power constructor can build the network structure efficiently. We can tune the parameter to obtain different complexity structures and perform model selection for the best one.

2. We have described the parameter learning methods, which can easily estimate the conditional probability table in the Bayesian networks.

3. We developed the KLA-algorithm, which always has tractable computational time and a trade-off with precision. The value of the levels that we keep can be selected on the basis of the structure and the performance. The required memory in KLA-algorithm is the minimum as compared to the other inference algorithms. This advantage extends large-scale Bayesian networks to some limited resource applications.

4. The simulation results in Section 5.1 show that both the VN-method and the FB-methods of the KLA-algorithm can obtain a good approximate, which have a low K-L divergence than the exact result from a junction tree algorithm. The VN-method can calculate the conditional probability of an interesting node and just spend a short time when the size of the evidence nodes is small. The FB-method calculates the conditional probabilities of all nodes. Thus, it may spend more time, but the computation time will stabilize with different sizes of evidence nodes.

5. The application in Section 5.2 shows that large-scale Bayesian networks by the KLA-algorithm with different levels approximations can have high precision with respect to classification of the ozone day. We also simulate the missing data case, and a new day prediction by drawing an inference from Bayesian network. We still have good result in this case.

Bayesian networks using the inference algorithm proposed here is computationally efficient and require a small memory space than traditional algorithms. Future researches includes how to draw inferences in continuous-type systems and mixed-type systems and how to handle the networks with the minimum-sized clique is still out of memory space.

Bibliography

[1] Booker, L. B., and Hota, N. “Probabilistic Reasoning about Ship Images,” Uncertainty in Artificial Intelligence, vol. 2, pp. 371-379, 1988.

[2] Charniak, E., and Goldman, R. P. “Plan Recognition in Stories and in Life,” In Proceed-ings of the Fifth Workshop on Uncertainty in Artificial Intelligence, pp. 54-60, 1989.

[3] Chickering, D. M. “Optimal Structure Identification with Greedy Search,” Journal of Machine Learning Research, vol. 3, pp. 507-554, 2002.

[4] Chickering, D. M., Geiger, D., and Heckerman, D. “Learning Bayesian Networks is Np-Hard,” Microsoft Research, Technical Report MSR-TR-94-17, 1994.

[5] Chow, C. K., and Liu, C.N. “Approximating Discrete Probability Distributions with Dependence Trees,” IEEE Transactions on Information Theory, vol. 14, no. 3, pp. 462-467, 1968.

[6] Cooper, G., and Hersovits, E. “A Bayesian Method for the Introduction of Probabilistic Networks from Data,” Machine Learning, vol. 9, pp. 309-347, 1992.

[7] Dawid, A. P., Kjaerulff, U., Lauritzen, S.L. “Hybrid Propagation in Junction Trees,”

Advances in Intelligent Computing (IPMU), pp. 85-97, 1994.

[8] Dempster, A., Laird, N., and Rubin, D. “Maximum Likelihood from Incomplete Data via the EM Algorithm,” Journal of the Royal Statistical Society, Series B, vol. 39, pp.

1-38, 1977.

[9] Diez, F. J. “Local Conditioning in Bayesian Networks,” Artificial Intelligence, vol. 87, pp. 1-20, 1996.

[10] Fishelson, M. and Geiger, D. “Optimizing Exact Genetic Linkage Computations,” Pro-ceedings of 7th Conference on Computational Molecular Biology (RECOMB), pp. 114-121, 2003.

[11] Geman, S. and Geman, D. “Stochastic Relaxation, Gibbs Distributions and the Bayesian Restoration of Images,” IEEE Transactions on Pattern Analysis and Machine Intelli-gence, vol. 12, pp. 609-628, 1984.

[12] Hansson, O., and Mayer, A. “Heuristic Search as Evidential Reasoning,” Proceedings of the Fifth Workshop on Uncertainty in Artificial Intelligence, 1989.

[13] Heckerman, D. “A Tutorial on Learning with Bayesian Networks,” Microsoft Research, Technical Report MSR-TR-95- 06, 1996.

[14] Huang, C., Darwiche, A. “Inference in Belief Networks: A Procedural Guide,” Interna-tional Journal of Approximate Reasoning, vol. 15, no. 3, pp. 225-263, 1996.

[15] Kim, J. H. and Pearl, J. “A Computation Model for Causal and Diagnostic Reason-ing in Inference Systems,” ProceedReason-ings of the Eighth International Joint Conference on Artificial Intelligence, Los Angeles, pp. 190-193, 1983.

[16] Kim, J. and Wilhelm, T. “What is a complex graph?,” Physica A: Statistical Mechanics and its Applications, vol. 387, no. 11, pp. 2637–2652, 2008.

[17] Leray, P., and Francois, O. “BNT Structure Learning Package: Documentation and Experiments,” Laboratoire PSI, Technical Report, 2004.

[18] Murphy, K. P., Weiss, Y., and Jordan, M. “Loopy Belief Propagation for Approximate Inference: An Empirical Study,” Proceedings of Uncertainty in Artificial Intelligence, pp. 467-475, 1999.

[19] Pearl, J. Causality: Models, Reasoning, and Inference. London: Cambridge University Press, 2000.

[20] Pearl, J. Probabilistic Reasoning in Intelligent Systems. CA: Morgan Kaufmann, 1988.

[21] Pearl, J. “Fusion, Propagation and Structuring in Belief Networks,” Artificial Intelli-gence, vol. 29, pp. 241-288, 1986.

[22] Pearl, J. and Verma, T.S. “A Theory of Inferred Causation,” Principles of Knowledge Representation and Reasoning: Proceedings of the 2nd International Conference, pp.

441–452, 1991.

[23] Schwarz, G. E. “Estimating the Dimension of a model,” Annals of Statistics, vol.6, no.

2, pp. 461-464, 1978.

[24] Spiegelhalter, D. J., Franklin, R. and Bull, K. “Assessment, Criticism, and Improve-ment of Imprecise Probabilities for a Medical Expert System,” Proceedings of the Fifth Conference on Uncertainty in Artificial Intelligence, pp. 285-294, 1989.

[25] Spirtes, P., Glymour, C. and Scheines, R. Causation, Prediction and Search. New York:

Springer, 2000.

[26] Srinivas, S. “A Generalization of the Noisy-Or Model,” Proceedings of Ninth Conference on Uncertainty in Artificial Intelligence, pp. 208-215, 1993.

[27] Suennondt, H. J. and Cooper, G.F. “Probabilistic Inference in Multiply Connected Belief Networks Using Loop Cutsets,” International Journal of Approximate Reasoning, vol. 4, pp. 283-306, 1990.

[28] Xiang, Y., Poole, D., and Beddoes, M.P. “Multiply Sectioned Bayesian Networks and Junction Forests for Large Knowledge Based Systems,” Computational Intelligence, vol.

9, no. 2, pp. 171-220, 1993.

[29] Zhang, N. L. and Poole, D. “A Simple Approach to Bayesian Network Computations,”

Proceedings of the Tenth Canadian Conference on Artificial Intelligence, pp. 171-178, 1994.

[30] Zhang, K. and Fan, W. “Forecasting Skewed Biased Stochastic Ozone Days: Analyses, Solutions and Beyond,” Knowledge and Information Systems, vol. 14, no. 3, pp. 299-326, 2008.

在文檔中大型貝氏網路推論之時間與準度權衡演算法 (頁 85-94)