Data and Natural Language Technologies
A.Y. 2024/2025
Learning objectives
- Provide students with in-depth knowledge of data and NLP technologies.
- Develop advanced skills in the use of tools and frameworks for data analysis
- Deepen students' understanding of natural language processing principles and its applications.
- Train students in designing and implementing machine learning-based solutions.
- Enable students to design and implement study and research activities based on data and language analysis methods.
- Develop advanced skills in the use of tools and frameworks for data analysis
- Deepen students' understanding of natural language processing principles and its applications.
- Train students in designing and implementing machine learning-based solutions.
- Enable students to design and implement study and research activities based on data and language analysis methods.
Expected learning outcomes
Upon completion of the course, students should be able to
- Apply advanced concepts of data technologies in practical contexts;
- Use NLP models to analyze natural language and solve specific problems;
- Apply advanced machine learning techniques in various application contexts;
- Successfully complete complex application projects that integrate data and NLP technologies.
- Apply advanced concepts of data technologies in practical contexts;
- Use NLP models to analyze natural language and solve specific problems;
- Apply advanced machine learning techniques in various application contexts;
- Successfully complete complex application projects that integrate data and NLP technologies.
Lesson period: Second semester
Assessment methods: Esame
Assessment result: voto verbalizzato in trentesimi
Single course
This course can be attended as a single course.
Course syllabus and organization
Single session
Responsible
Lesson period
Second semester
Course syllabus
Introduction to Data Science for the Humanities
- The multidisciplinary context of Data Science
- The data revolution and the evolution of artificial intelligence
- Challenges and open questions on the social and cultural impact of data technologies
Natural Language Processing (NLP)
- Introduction to the principles of natural language processing
- Artificial intelligence and natural language processing
- Difficulties and characteristics of natural language
- Limits of symbolic models
- The notion of Language Models and statistical models
Introduction to Machine Learning
- The paradigm shift from knowledge-based models to learning models
- Learning machines and learning models
- Unsupervised learning
- Reinforcement learning
- Supervised learning
Introduction to neural networks
- The learning mechanisms of neural networks
- Applications to natural language (word embedding and non-contextual models)
- Neural Language models
- Sequence-2-Sequence learning: RNN and LSTM
- Encoder-decoder architectures, attention mechanisms and large language models
Legal and ethical issues related to generative artificial intelligence
- Transparency and explainability of generative models
- Hallucination and error
- Stereotypes and bias in generative models
Design
- Construction of a project for the application of generative models to problems of interest in humanistic studies
- Implementation of the project and collection of results
- Project presentation
- The multidisciplinary context of Data Science
- The data revolution and the evolution of artificial intelligence
- Challenges and open questions on the social and cultural impact of data technologies
Natural Language Processing (NLP)
- Introduction to the principles of natural language processing
- Artificial intelligence and natural language processing
- Difficulties and characteristics of natural language
- Limits of symbolic models
- The notion of Language Models and statistical models
Introduction to Machine Learning
- The paradigm shift from knowledge-based models to learning models
- Learning machines and learning models
- Unsupervised learning
- Reinforcement learning
- Supervised learning
Introduction to neural networks
- The learning mechanisms of neural networks
- Applications to natural language (word embedding and non-contextual models)
- Neural Language models
- Sequence-2-Sequence learning: RNN and LSTM
- Encoder-decoder architectures, attention mechanisms and large language models
Legal and ethical issues related to generative artificial intelligence
- Transparency and explainability of generative models
- Hallucination and error
- Stereotypes and bias in generative models
Design
- Construction of a project for the application of generative models to problems of interest in humanistic studies
- Implementation of the project and collection of results
- Project presentation
Prerequisites for admission
It is not necessary to have in-depth knowledge of all the specific technologies mentioned in the course, as these will be covered during the lessons. However, some prior knowledge of programming, data management and understanding, and good motivation and interest in the applications of artificial intelligence and machine learning in humanities and linguistics will help students follow the lessons more effectively and to achieve the expected learning objectives.
Teaching methods
The course is given in the form of lectures with extensive use of examples and support materials such as Python notebooks. Slides and handouts are employed throughout the lectures and they are progressively published on the reference course website on the Ariel platform and on the GitHub repository (https://github.com/afflint/tdl).
Teaching Resources
The course mainly uses notes, notebooks and materials provided by the teacher and published on the Ariel teaching site. For further information, it is possible to integrate these materials with some suggested readings:
- Aurélien Géron, Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems, O'Reilly Media; 3rd edition (November 8, 2022)
- Aurélien Géron, Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems, O'Reilly Media; 3rd edition (November 8, 2022)
Assessment methods and Criteria
Development of a project. The topic of the project must be discussed previously with the teacher. The project should demonstrate understanding of the lecture topics and the ability to propose and motivate innovative solutions to specific research problems.
The project will be evaluated through a discussion with the teacher on the project results and related topics. The evaluation will take into account both the project and the interview.
The use of the SIFA service to participate in the exam is mandatory. After registering for an exam on SIFA, students are encouraged to contact the instructor to schedule the discussion.
The project will be evaluated through a discussion with the teacher on the project results and related topics. The evaluation will take into account both the project and the interview.
The use of the SIFA service to participate in the exam is mandatory. After registering for an exam on SIFA, students are encouraged to contact the instructor to schedule the discussion.
Professor(s)
Reception:
On appointment. The meeting will be online by first contacting the professor by email.
Online. In case of a meeting in person, Department of Computer Science, via Celoria 18 Milano, Room 7012 (7 floor)