Data and Natural Language Technologies | Università degli Studi di Milano Statale

A.Y. 2024/2025

Max ECTS

Overall hours

SSD

INF/01

Language

Italian

Included in the following degree programmes

Modern Humanities-Enrolled from 2024-2025

Philology, Literature and History of Antiquity (Classe LM-15)-Enrolled from 2023/2024

Learning objectives

- Provide students with in-depth knowledge of data and NLP technologies.
- Develop advanced skills in the use of tools and frameworks for data analysis
- Deepen students' understanding of natural language processing principles and its applications.
- Train students in designing and implementing machine learning-based solutions.
- Enable students to design and implement study and research activities based on data and language analysis methods.

Expected learning outcomes

Upon completion of the course, students should be able to
- Apply advanced concepts of data technologies in practical contexts;
- Use NLP models to analyze natural language and solve specific problems;
- Apply advanced machine learning techniques in various application contexts;
- Successfully complete complex application projects that integrate data and NLP technologies.

Lesson period: Second semester

Lessons timetable

Assessment methods: Esame
Assessment result: voto verbalizzato in trentesimi

Exams calendar

Single course

This course can be attended as a single course.

Take a single course

Course syllabus and organization

Single session

Responsible

Ferrara Alfio

Lesson period

Second semester

Syllabus

Course syllabus

Introduction to Data Science for the Humanities
- The multidisciplinary context of Data Science
- The data revolution and the evolution of artificial intelligence
- Challenges and open questions on the social and cultural impact of data technologies

Natural Language Processing (NLP)
- Introduction to the principles of natural language processing
- Artificial intelligence and natural language processing
- Difficulties and characteristics of natural language
- Limits of symbolic models
- The notion of Language Models and statistical models

Introduction to Machine Learning
- The paradigm shift from knowledge-based models to learning models
- Learning machines and learning models
- Unsupervised learning
- Reinforcement learning
- Supervised learning

Introduction to neural networks
- The learning mechanisms of neural networks
- Applications to natural language (word embedding and non-contextual models)
- Neural Language models
- Sequence-2-Sequence learning: RNN and LSTM
- Encoder-decoder architectures, attention mechanisms and large language models

Legal and ethical issues related to generative artificial intelligence
- Transparency and explainability of generative models
- Hallucination and error
- Stereotypes and bias in generative models

Design
- Construction of a project for the application of generative models to problems of interest in humanistic studies
- Implementation of the project and collection of results
- Project presentation

Prerequisites for admission

It is not necessary to have in-depth knowledge of all the specific technologies mentioned in the course, as these will be covered during the lessons. However, some prior knowledge of programming, data management and understanding, and good motivation and interest in the applications of artificial intelligence and machine learning in humanities and linguistics will help students follow the lessons more effectively and to achieve the expected learning objectives.

Teaching methods

The course is given in the form of lectures with extensive use of examples and support materials such as Python notebooks. Slides and handouts are employed throughout the lectures and they are progressively published on the reference course website on the Ariel platform and on the GitHub repository (https://github.com/afflint/tdl).

Teaching Resources

The course mainly uses notes, notebooks and materials provided by the teacher and published on the Ariel teaching site. For further information, it is possible to integrate these materials with some suggested readings:
- Aurélien Géron, Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems, O'Reilly Media; 3rd edition (November 8, 2022)

Assessment methods and Criteria

Development of a project. The topic of the project must be discussed previously with the teacher. The project should demonstrate understanding of the lecture topics and the ability to propose and motivate innovative solutions to specific research problems.
The project will be evaluated through a discussion with the teacher on the project results and related topics. The evaluation will take into account both the project and the interview.
The use of the SIFA service to participate in the exam is mandatory. After registering for an exam on SIFA, students are encouraged to contact the instructor to schedule the discussion.

Course structure

INF/01 - INFORMATICS - University credits: 6

Lessons: 40 hours

Professor: Ferrara Alfio

Educational website(s)

Tecnologie dei dati e del linguaggio (a.a. 2024/25)

Professor(s)

Ferrara Alfio

Web site

Reception:

On appointment. The meeting will be online by first contacting the professor by email.

Online. In case of a meeting in person, Department of Computer Science, via Celoria 18 Milano, Room 7012 (7 floor)