Section 2 : Bayesian learning and EM algorithm

Commentary

Section Goals

To introduce several Bayesian learning methods, such as MAP, ML, Bayesian parameter learning, and a more complicated method, the expectation-maximization (EM) algorithm, which covers learning Bayesian networks with hidden variables.

Learning Objectives

Learning Objective 1

Describe the basic principles and formulae of Bayesian learning.
Explain the principles of and the relations between MAP, MDL, and ML learning.
Discuss ML parameter learning for both discrete and continuous models, and discuss the issues related to Bayesian parameter learning, such as parameter independence.
Explain learning methods for naive Bayes models and Bayes net structures.
Discuss examples of EM algorithms, such as unsupervised clustering, or learning mixtures of Gaussians, learning Bayesian networks with hidden variables, and learning hidden Markov models.
Outline the general form of the EM algorithm.
Explain the following concepts or terms:
- Bayesian learning
- Hypothesis prior
- Likelihood
- Maximum a posteriori (MAP)
- Minimum description length (MDL)
- Maximum-likelihood (ML)
- Parameter learning
- Log likelihood
- Naive Bayes
- Bayesian parameter learning
- Parameter independence
- Learning Bayes net structure
- Hidden variable (or latent variable)
- Expectation-maximization (EM)

Objective Readings

Required readings:

Reading topics:

Statistical Learning, Learning with Complete Data, Learning with Hidden Variables, EM Algorithm (see Sections 20.1-20.3 of AIMA3ed)

Zhang, N. L. (1996). Irrelevance and parameter learning in Bayesian networks. Artificial Intelligence, 88(1-2), 359-373.

Cheng, J., Greiner, R., Kelly, J., Bell, D., and Liu, W. (2002). Learning Bayesian networks from data: An information-theory based approach. Artificial Intelligence, 137(1-2), 43-90.

Supplemental Readings

Niculescu, R. S., Mitchell, T. M., and Rao, R. B. (2006). Bayesian Network Learning with Parameter Constraints. Special Topic on Machine Learning and Optimization, in Journal of Machine Learning Research, 7(Jul), 1357-1383.

Objective Questions

Why is log likelihood sometimes used instead of likelihood Bayesian learning?
How are approximation methods, such as MAP and ML, justified?
Why are naive Bayes models used more often than other kinds of Bayesian networks?
Why do we say that Bayesian learning formulates learning as a form of probabilistic inference?
What is the purpose of each of the main steps in the general EM algorithms?

Objective Activities

Search and explore some open source software packages that include statistical learning algorithms. Choose one of them to test, and report your findings regarding the software to the course conference.
Complete Exercise 20.2 of AIMA3ed.
Complete Exercise 20.10 of AIMA3ed.

Athabasca University

Study Guide

Computer Science 657: Artificial Intelligence: Principles and Techniques (Rev. 1)

Unit 10: Computational Learning Theory and Statistical Learning