Laboratory "reinforcement learning"
A.A. 2022/2023
Obiettivi formativi
This Lab is provided within the Data Science for Economics (DSE) degree program.
A small number of students can be admitted due to logistics constraints.
The students (either DSE or non-DSE) must apply for admission. Candidates will be selected by the involved institutions/companies according to CV and motivations.
For application, students must respond to a call that is posted on this website: https://dse.cdl.unimi.it/en/courses/laboratories
The call is typically published a few weeks before the Lab starts.
This laboratory provides a of Reinforcement Learning, the subfield of Machine Learning studying adaptive agents that take actions and interact with an unknown environment. Reinforcement learning is a powerful paradigm for the study of autonomous AI systems, and has been applied to a wide range of tasks, including self-driving cars, game playing, customer management, and healthcare.
A small number of students can be admitted due to logistics constraints.
The students (either DSE or non-DSE) must apply for admission. Candidates will be selected by the involved institutions/companies according to CV and motivations.
For application, students must respond to a call that is posted on this website: https://dse.cdl.unimi.it/en/courses/laboratories
The call is typically published a few weeks before the Lab starts.
This laboratory provides a of Reinforcement Learning, the subfield of Machine Learning studying adaptive agents that take actions and interact with an unknown environment. Reinforcement learning is a powerful paradigm for the study of autonomous AI systems, and has been applied to a wide range of tasks, including self-driving cars, game playing, customer management, and healthcare.
Risultati apprendimento attesi
Upon completion of the course students will be able to:
-understand Markov Decision Processes,
-understand some basic learning algorithms for MDP
-run experiments in simulated environments.
These objectives are measured via a combination of two components:the project report and the oral discussion. The final grade is formed byassessing the project report, and then using the oral discussion for finetuning.
-understand Markov Decision Processes,
-understand some basic learning algorithms for MDP
-run experiments in simulated environments.
These objectives are measured via a combination of two components:the project report and the oral discussion. The final grade is formed byassessing the project report, and then using the oral discussion for finetuning.
Periodo: Secondo trimestre
Modalità di valutazione: Giudizio di approvazione
Giudizio di valutazione: superato/non superato
Corso singolo
Questo insegnamento non può essere seguito come corso singolo. Puoi trovare gli insegnamenti disponibili consultando il catalogo corsi singoli.
Programma e organizzazione didattica
Edizione unica
Responsabile
Periodo
Secondo trimestre
Programma
1 Fundamentals
1.1 Markov Decision Processes and Bellman optimality equations
1.2 Value iteration and policy iteration
1.3 Linear programming formulation
1.4 Sample complexity
2 Exploration
2.1 Multi-armed bandits
2.2 Efficient exploration in tabular MDPs
2.3 Linear bandits
2.4 Efficient exploration in linearly parameterized MDPs
3 Policy optimization
3.1 Policy gradient methods
3.2 Regularized methods
1.1 Markov Decision Processes and Bellman optimality equations
1.2 Value iteration and policy iteration
1.3 Linear programming formulation
1.4 Sample complexity
2 Exploration
2.1 Multi-armed bandits
2.2 Efficient exploration in tabular MDPs
2.3 Linear bandits
2.4 Efficient exploration in linearly parameterized MDPs
3 Policy optimization
3.1 Policy gradient methods
3.2 Regularized methods
Prerequisiti
Il corso richiede delle conoscenze di base di analisi, algebra lineare e statistica.
E` anche richiesta la conoscenza del linguaggio di programmazione Python.
E` anche richiesta la conoscenza del linguaggio di programmazione Python.
Metodi didattici
Lezioni frontali con esempi svolti.
Materiale di riferimento
Shie Mannor, Yishay Mansour, and Aviv Tamar
Reinforcement Learning: Foundations
(Working Draft: https://sites.google.com/view/rlfoundations/home)
Dispense e Jupyter notebooks forniti dal docente.
Reinforcement Learning: Foundations
(Working Draft: https://sites.google.com/view/rlfoundations/home)
Dispense e Jupyter notebooks forniti dal docente.
Modalità di verifica dell’apprendimento e criteri di valutazione
Progetto sperimentale. Il progetto verrà valutato mediante una discussione che riguarderà anche argomenti di teoria svolti nell'insegnamento. Il voto finale terrà conto sia del progetto sia dell'esame orale.
Docente/i