Section 2 : Markov decision processes (MDPs)

Commentary

Section Goals

To introduce the principles and techniques of solving sequential decision problems.

Learning Objectives

Learning Objective 1

Exemplify sequential decision problems.
Define Markov decision process (MDP) and partially observable MDP (POMDP), and explain their components and representations.
Describe value iteration and policy iteration algorithms for solving MDPs.
Describe the basic idea of dynamic decision networks.
Explain the following concepts or terms:
- Sequential decision problem
- Markov decision process (MDP)
- Policy and optimal policy
- Discounted rewards
- Bellman equation
- Bellman update
- Partially observable MDPs (POMDPs)
- Dynamic decision network (DDN)

Objective Readings

Required readings:

Reading topics:

Sequential Decision Problems, Value/Policy Iteration, POMDP, Decision-Theoretic Agents (see Sections 17.1-17.4 of AIMA3ed).

Supplemental Readings

Itoh, H., and Nakamura, K. (2007). Partially observable Markov decision processes with imprecise parameters. Artificial Intelligence, 171(8-9), 453-490.

Objective Questions

What is the decision cycle of a POMDP agent?
What is the difference between the DBN and DDN?

Objective Activities

Explore the following source code for value and policy iteration algorithms related to this section from the textbook's website:
- Value-Iteration
- Policy-Iteration
Complete Exercise 17.4 of AIMA3ed.
Complete Exercise 17.10 of AIMA3ed.

Athabasca University

Study Guide

Computer Science 657: Artificial Intelligence: Principles and Techniques (Rev. 1)

Unit 8: Complex Decision-Making