Section 2 : Markov decision processes (MDPs)
Commentary
Section Goals
- To introduce the principles and techniques of solving sequential decision problems.
Learning Objectives
Learning Objective 1
- Exemplify sequential decision problems.
- Define Markov decision process (MDP) and partially observable MDP (POMDP), and explain their components and representations.
- Describe value iteration and policy iteration algorithms for solving MDPs.
- Describe the basic idea of dynamic decision networks.
- Explain the following concepts or terms:
- Sequential decision problem
- Markov decision process (MDP)
- Policy and optimal policy
- Discounted rewards
- Bellman equation
- Bellman update
- Partially observable MDPs (POMDPs)
- Dynamic decision network (DDN)
Objective Readings
Required readings:
Reading topics:
Sequential Decision Problems, Value/Policy Iteration, POMDP, Decision-Theoretic Agents (see Sections 17.1-17.4 of AIMA3ed).
Supplemental Readings
Itoh, H., and Nakamura, K. (2007). Partially observable Markov decision processes with imprecise parameters. Artificial Intelligence, 171(8-9), 453-490.
Objective Questions
- What is the decision cycle of a POMDP agent?
- What is the difference between the DBN and DDN?
Objective Activities
- Explore the following source code for value and policy iteration algorithms related to this section from the textbook's website:
- Value-Iteration
- Policy-Iteration
- Complete Exercise 17.4 of AIMA3ed.
- Complete Exercise 17.10 of AIMA3ed.