Section 1 : Reinforcement learning

Commentary

Section Goals

To introduce the basic concepts and methods of reinforcement learning.
To discuss several algorithms and applications of reinforcement learning.

Learning Objectives

Learning Objective 1

Outline the problem description and representation of reinforcement learning.
Explain how reinforcement learning makes use of and calculates rewards and utilities.
Describe the TD, ADP, and Q-learning algorithms.
Discuss possible applications of reinforcement learning.
Explain the following concepts or terms:
- Reinforcement learning
- Active learning
- Passive learning
- Q-learning
- Direct utility estimation
- Adaptive dynamic programming (ADP)
- Temporal difference (TD)
- Exploration function

Objective Readings

Required readings:

Reading topics:

Reinforcement Learning (see Chapter 21 of AIMA3ed)

Supplemental Readings

Sutton, R. S., and Barto, A. G. (1998). Reinforcement L: An introduction. Cambridge, MA: MIT Press.

Price, B., and Boutilier, C. (2003) Accelerating reinforcement learning through implicit imitation. Journal of Artificial Intelligence Research, 19, 569-629.

Objective Questions

What are the main differences between the ADP and TD approaches, with respect to algorithms and performance?

Objective Activities

Explore the Internet to find three papers about graphical models that are at your level of knowledge, and discuss them in the online course conference.
Explore the following source code for reinforcement learning algorithms related to this section from the textbook's website.
- Passive-ADP-Agent
- Passive-TD-Agent
- Q-Learning-Agent
Complete Exercise 21.2 of AIMA3ed.
Complete Exercise 21.10 of AIMA3ed.

Athabasca University

Study Guide

Computer Science 657: Artificial Intelligence: Principles and Techniques (Rev. 1)

Unit 11: Reinforcement Learning, Deep Learning and MultiAgent Learning