Laboratory "reinforcement learning"

A.A. 2022/2023
3
Crediti massimi
20
Ore totali
SSD
INF/01
Lingua
Inglese
Obiettivi formativi
This Lab is provided within the Data Science for Economics (DSE) degree program.
A small number of students can be admitted due to logistics constraints.
The students (either DSE or non-DSE) must apply for admission. Candidates will be selected by the involved institutions/companies according to CV and motivations.
For application, students must respond to a call that is posted on this website: https://dse.cdl.unimi.it/en/courses/laboratories
The call is typically published a few weeks before the Lab starts.

This laboratory provides a of Reinforcement Learning, the subfield of Machine Learning studying adaptive agents that take actions and interact with an unknown environment. Reinforcement learning is a powerful paradigm for the study of autonomous AI systems, and has been applied to a wide range of tasks, including self-driving cars, game playing, customer management, and healthcare.
Risultati apprendimento attesi
Upon completion of the course students will be able to:
-understand Markov Decision Processes,
-understand some basic learning algorithms for MDP
-run experiments in simulated environments.
These objectives are measured via a combination of two components:the project report and the oral discussion. The final grade is formed byassessing the project report, and then using the oral discussion for finetuning.
Corso singolo

Questo insegnamento non può essere seguito come corso singolo. Puoi trovare gli insegnamenti disponibili consultando il catalogo corsi singoli.

Programma e organizzazione didattica

Edizione unica

Periodo
Secondo trimestre

Programma
1 Fundamentals
1.1 Markov Decision Processes and Bellman optimality equations
1.2 Value iteration and policy iteration
1.3 Linear programming formulation
1.4 Sample complexity
2 Exploration
2.1 Multi-armed bandits
2.2 Efficient exploration in tabular MDPs
2.3 Linear bandits
2.4 Efficient exploration in linearly parameterized MDPs
3 Policy optimization
3.1 Policy gradient methods
3.2 Regularized methods
Prerequisiti
Il corso richiede delle conoscenze di base di analisi, algebra lineare e statistica.
E` anche richiesta la conoscenza del linguaggio di programmazione Python.
Metodi didattici
Lezioni frontali con esempi svolti.
Materiale di riferimento
Shie Mannor, Yishay Mansour, and Aviv Tamar
Reinforcement Learning: Foundations
(Working Draft: https://sites.google.com/view/rlfoundations/home)

Dispense e Jupyter notebooks forniti dal docente.
Modalità di verifica dell’apprendimento e criteri di valutazione
Progetto sperimentale. Il progetto verrà valutato mediante una discussione che riguarderà anche argomenti di teoria svolti nell'insegnamento. Il voto finale terrà conto sia del progetto sia dell'esame orale.
INF/01 - INFORMATICA - CFU: 3
Lezioni: 20 ore
Docente/i
Ricevimento:
Su appuntamento
via Celoria 18. Stanza 7007