Laboratory "reinforcement Learning" | Università degli Studi di Milano Statale

A.Y. 2022/2023

Max ECTS

Overall hours

SSD

INF/01

Language

English

Included in the following degree programmes

Data Science and Economics - (Classe LM-91)-Enrolled Until 2021/2022 Academic Year

Data Science for Economics (Classe LM-data)-Enrolled from 2022/23 Academic Year

Learning objectives

This Lab is provided within the Data Science for Economics (DSE) degree program.
A small number of students can be admitted due to logistics constraints.
The students (either DSE or non-DSE) must apply for admission. Candidates will be selected by the involved institutions/companies according to CV and motivations.
For application, students must respond to a call that is posted on this website: https://dse.cdl.unimi.it/en/courses/laboratories
The call is typically published a few weeks before the Lab starts.

This laboratory provides a of Reinforcement Learning, the subfield of Machine Learning studying adaptive agents that take actions and interact with an unknown environment. Reinforcement learning is a powerful paradigm for the study of autonomous AI systems, and has been applied to a wide range of tasks, including self-driving cars, game playing, customer management, and healthcare.

Expected learning outcomes

Upon completion of the course students will be able to:
-understand Markov Decision Processes,
-understand some basic learning algorithms for MDP
-run experiments in simulated environments.
These objectives are measured via a combination of two components:the project report and the oral discussion. The final grade is formed byassessing the project report, and then using the oral discussion for finetuning.

Lesson period: Second trimester

Lessons timetable

Assessment methods: Giudizio di approvazione
Assessment result: superato/non superato

Exams calendar

Single course

This course cannot be attended as a single course. Please check our list of single courses to find the ones available for enrolment.

Search a single course

Course syllabus and organization

Single session

Responsible

Cesa Bianchi Nicolo' Antonio

Lesson period

Second trimester

Syllabus

Course syllabus

1 Fundamentals
1.1 Markov Decision Processes and Bellman optimality equations
1.2 Value iteration and policy iteration
1.3 Linear programming formulation
1.4 Sample complexity
2 Exploration
2.1 Multi-armed bandits
2.2 Efficient exploration in tabular MDPs
2.3 Linear bandits
2.4 Efficient exploration in linearly parameterized MDPs
3 Policy optimization
3.1 Policy gradient methods
3.2 Regularized methods

Prerequisites for admission

The course requires basic knowledge in calculus, linear algebra, and statistics.
Knowledge of the Python programming language is also required.

Teaching methods

Lecture-style instruction with worked-out examples.

Teaching Resources

Shie Mannor, Yishay Mansour, and Aviv Tamar
Reinforcement Learning: Foundations
(Working Draft: https://sites.google.com/view/rlfoundations/home)

Lecture notes and Jupyter notebooks provided by the instructor.

Assessment methods and Criteria

Experimental project. The project will be evaluated through a discussion which will also include questions on the theory covered in the course. The final grade will take into account both the project and the oral examination.

Course structure

INF/01 - INFORMATICS - University credits: 3

Lessons: 20 hours

Professor: Cesa Bianchi Nicolo' Antonio

Professor(s)

Cesa Bianchi Nicolo' Antonio

Web site

Reception:

By appointment

18, via Celoria. Room 7007