Reinforcement Learning

A.Y. 2024/2025
6
Max ECTS
40
Overall hours
SSD
INF/01
Language
English
Learning objectives
This course introduces the theoretical and algorithmic foundations of Reinforcement Learning, the subfield of Machine Learning studying adaptive agents that take actions and interact with an unknown environment. Reinforcement learning is a powerful paradigm for the study of autonomous AI systems, and has been applied to a wide range of tasks, including self-driving cars, game playing, customer management, and healthcare.
Expected learning outcomes
Upon completion of the course students will be able to:
- formalize problems in terms of Markov Decision Processes,
- understand basic methods of strategic exploration,
- understand algorithms for direct policy optimization,
- run experiments in simulated environments.
These objectives are measured via a combination of two components: the project report and the oral discussion. The final grade is formed by assessing the project report, and then using the oral discussion for fine tuning.
Single course

This course can be attended as a single course.

Course syllabus and organization

Single session

Responsible
Lesson period
Second trimester
Course syllabus
This course introduces the theoretical and algorithmic foundations of Reinforcement Learning, the subfield of Machine Learning studying adaptive agents that take actions and interact with an unknown environment. Reinforcement learning is a powerful paradigm for the study of autonomous AI systems, and has been applied to a wide range of tasks, including self-driving cars, game playing, customer management, and healthcare. Topics will be:
Introduction
What is reinforcement learning
Deterministic decision processes
Markov decision processes
Evaluation criteria: finite horizon, infinite horizon, discounted horizon
Markov policies and their properties
Finite horizon
State-value function
Action-value function
Bellman optimality equations for finite horizon
Discounted horizon
Bellman optimality equations for discounted horizon
Value iteration
Policy iteration
Linear programming interpretation
Model-based reinforcement learning
Model-free reinforcement learning
Q-learning
SARSA
Temporal difference algorithms
TD(0)
TD(λ)
Equivalence between forward and backward view
Value function approximation
Policy gradient methods
Developing a reinforcement learning project
Prerequisites for admission
Knowledge in the areas of statistical methods, machine learning, and Python programming.
Teaching methods
The course is given in the form of lectures with extensive use of examples and support materials such as Python notebooks. Slides and handouts are employed throughout the lectures and they are progressively published on the reference course website on the Ariel platform.
Lecture attendance is not mandatory, but it is strongly recommended.
Teaching Resources
Notes, notebooks and materials provided by the lecturers and published on the Ariel website of the course.
Assessment methods and Criteria
Upon completion of the course students will be able to:
- formalize problems in terms of Markov Decision Processes,
- describe the basic performance criteria for MDPs,
- understand the main algorithms for model-based and model-free RL,
- understand the main RL approaches in large state spaces
- run experiments in simulated environments.
These objectives are measured via a combination of two components: the project report and the oral discussion. The final grade is formed by assessing the project report, and then using the oral discussion for fine tuning. The grade is in the range 0/30.
INF/01 - INFORMATICS - University credits: 6
Lessons: 40 hours
Professor(s)
Reception:
By appointment
18, via Celoria. Room 7007
Reception:
On appointment. The meeting will be online by first contacting the professor by email.
Online. In case of a meeting in person, Department of Computer Science, via Celoria 18 Milano, Room 7012 (7 floor)