Data and Natural Language Technologies

A.Y. 2024/2025
6
Max ECTS
40
Overall hours
SSD
INF/01
Language
Italian
Learning objectives
- Provide students with in-depth knowledge of data and NLP technologies.
- Develop advanced skills in the use of tools and frameworks for data analysis
- Deepen students' understanding of natural language processing principles and its applications.
- Train students in designing and implementing machine learning-based solutions.
- Enable students to design and implement study and research activities based on data and language analysis methods.
Expected learning outcomes
Upon completion of the course, students should be able to
- Apply advanced concepts of data technologies in practical contexts;
- Use NLP models to analyze natural language and solve specific problems;
- Apply advanced machine learning techniques in various application contexts;
- Successfully complete complex application projects that integrate data and NLP technologies.
Single course

This course can be attended as a single course.

Course syllabus and organization

Single session

Responsible
Lesson period
Second semester
Course syllabus
Introduction to Data Science for the Humanities
- The multidisciplinary context of Data Science
- The data revolution and the evolution of artificial intelligence
- Challenges and open questions on the social and cultural impact of data technologies

Natural Language Processing (NLP)
- Introduction to the principles of natural language processing
- Artificial intelligence and natural language processing
- Difficulties and characteristics of natural language
- Limits of symbolic models
- The notion of Language Models and statistical models

Introduction to Machine Learning
- The paradigm shift from knowledge-based models to learning models
- Learning machines and learning models
- Unsupervised learning
- Reinforcement learning
- Supervised learning

Introduction to neural networks
- The learning mechanisms of neural networks
- Applications to natural language (word embedding and non-contextual models)
- Neural Language models
- Sequence-2-Sequence learning: RNN and LSTM
- Encoder-decoder architectures, attention mechanisms and large language models

Legal and ethical issues related to generative artificial intelligence
- Transparency and explainability of generative models
- Hallucination and error
- Stereotypes and bias in generative models

Design
- Construction of a project for the application of generative models to problems of interest in humanistic studies
- Implementation of the project and collection of results
- Project presentation
Prerequisites for admission
It is not necessary to have in-depth knowledge of all the specific technologies mentioned in the course, as these will be covered during the lessons. However, some prior knowledge of programming, data management and understanding, and good motivation and interest in the applications of artificial intelligence and machine learning in humanities and linguistics will help students follow the lessons more effectively and to achieve the expected learning objectives.
Teaching methods
The course is given in the form of lectures with extensive use of examples and support materials such as Python notebooks. Slides and handouts are employed throughout the lectures and they are progressively published on the reference course website on the Ariel platform and on the GitHub repository (https://github.com/afflint/tdl).
Teaching Resources
The course mainly uses notes, notebooks and materials provided by the teacher and published on the Ariel teaching site. For further information, it is possible to integrate these materials with some suggested readings:
- Aurélien Géron, Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems, O'Reilly Media; 3rd edition (November 8, 2022)
Assessment methods and Criteria
Development of a project. The topic of the project must be discussed previously with the teacher. The project should demonstrate understanding of the lecture topics and the ability to propose and motivate innovative solutions to specific research problems.
The project will be evaluated through a discussion with the teacher on the project results and related topics. The evaluation will take into account both the project and the interview.
The use of the SIFA service to participate in the exam is mandatory. After registering for an exam on SIFA, students are encouraged to contact the instructor to schedule the discussion.
INF/01 - INFORMATICS - University credits: 6
Lessons: 40 hours
Professor: Ferrara Alfio
Professor(s)
Reception:
On appointment. The meeting will be online by first contacting the professor by email.
Online. In case of a meeting in person, Department of Computer Science, via Celoria 18 Milano, Room 7012 (7 floor)