Skip To Content

Athabasca University

Section 2 : Markov decision processes (MDPs)

Commentary

Section Goals

  • To introduce the principles and techniques of solving sequential decision problems.

Learning Objectives

Learning Objective 1

  • Exemplify sequential decision problems.
  • Define Markov decision process (MDP) and partially observable MDP (POMDP), and explain their components and representations.
  • Describe value iteration and policy iteration algorithms for solving MDPs.
  • Describe the basic idea of dynamic decision networks.
  • Explain the following concepts or terms:
    • Sequential decision problem
    • Markov decision process (MDP)
    • Policy and optimal policy
    • Discounted rewards
    • Bellman equation
    • Bellman update
    • Partially observable MDPs (POMDPs)
    • Dynamic decision network (DDN)

Objective Readings

Required readings:

Reading topics:

Sequential Decision Problems, Value/Policy Iteration, POMDP, Decision-Theoretic Agents (see Sections 17.1-17.4 of AIMA3ed).

Supplemental Readings

Itoh, H., and Nakamura, K. (2007). Partially observable Markov decision processes with imprecise parameters. Artificial Intelligence, 171(8-9), 453-490.

Objective Questions

  • What is the decision cycle of a POMDP agent?
  • What is the difference between the DBN and DDN?

Objective Activities

  • Explore the following source code for value and policy iteration algorithms related to this section from the textbook's website:
    • Value-Iteration
    • Policy-Iteration
  • Complete Exercise 17.4 of AIMA3ed.
  • Complete Exercise 17.10 of AIMA3ed.

Updated November 17 2015 by FST Course Production Staff