Algorithms for Massive Data, Cloud and Distributed Computing
A.Y. 2022/2023
Learning objectives
The objective of this course is to introduce the fundamental concepts at the basis of massive data management and analysis, including the main processing techniques dealing with data at massive scale and their implementation on distributed computational frameworks, on one side, and the technologies and solutions at the basis of cloud computing paradigm and modern distributed systems (e.g., microservice architectures), on the other side.
The course will also study the security and privacy risks arising in public and semi-public data release and in emerging scenarios (e.g., the cloud), illustrating solutions aimed at mitigating these risks.
The course will also study the security and privacy risks arising in public and semi-public data release and in emerging scenarios (e.g., the cloud), illustrating solutions aimed at mitigating these risks.
Expected learning outcomes
The student will have knowledge and understanding of the main approaches enabling the processing of massive amounts of data, as well as the operating principles of modern distributed computing systems, including cloud computing and microservice-based architectures. The student will acquire the ability to design and execute computations on massive datasets. The student will be able to identify privacy risks in data publication and in outsourcing scenarios, and to propose and evaluate solutions able to mitigate such risks.
Lesson period: Second trimester
Assessment methods: Esame
Assessment result: voto verbalizzato in trentesimi
Single course
This course cannot be attended as a single course. Please check our list of single courses to find the ones available for enrolment.
Course syllabus and organization
Single session
Responsible
Lesson period
Second trimester
Prerequisites for admission
None
Assessment methods and Criteria
The exam consists of two tests.
For "Security for Cloud Computing" Unit (items from 1 to 7) and "Algorithms for massive datasets" Unit (items from 1 to 4), the exam consists of a written test (1 hour and 30 minutes), including both questions and exercises covering the topics of the syllabus. Questions and exercises are aimed at evaluating the knowledge and understanding of the student of the course.
For the "Algorithms for massive datasets" Unit (items from 5 to 10), the exam consists of a project and an oral test, both related to the topics covered in the course. The project, described in a report, requires to process one or more datasets through the critical application of the techniques described during the classes. The oral test, which can be accessed after a positive evaluation of the project, is based on the discussion of some topics covered in the course and on in-depth questions about the presented project.
The evaluation is expressed on a 1-30 scale and is computed considering the evaluation obtained in each of the two tests.
The results of the exams are available on the Ariel web page of the course.
For "Security for Cloud Computing" Unit (items from 1 to 7) and "Algorithms for massive datasets" Unit (items from 1 to 4), the exam consists of a written test (1 hour and 30 minutes), including both questions and exercises covering the topics of the syllabus. Questions and exercises are aimed at evaluating the knowledge and understanding of the student of the course.
For the "Algorithms for massive datasets" Unit (items from 5 to 10), the exam consists of a project and an oral test, both related to the topics covered in the course. The project, described in a report, requires to process one or more datasets through the critical application of the techniques described during the classes. The oral test, which can be accessed after a positive evaluation of the project, is based on the discussion of some topics covered in the course and on in-depth questions about the presented project.
The evaluation is expressed on a 1-30 scale and is computed considering the evaluation obtained in each of the two tests.
The results of the exams are available on the Ariel web page of the course.
Module Cloud Computing and Algorithms for Massive Data
Course syllabus
1- Cloud Computing Fundamentals
a. Service and models, technologies, and case studies
b. Migration to the cloud, cloudonomics, challenges and issues
c. Non-functional aspects of the cloud
2- Big Data Platforms-as-a-Service
3- Microservice Architecture Fundamentals
a. Overview and basic concepts
b. Microservice migration and orchestration
4- Microservices and Big Data: Model-Based Big Data Analytics-as-a-Service5- Technical prelimininaries
6- HDFS, MapReduce algorithms, Spark
7- Link analysis
8- Finding similar items
9- Frequent itemsets
10- Recommendation systems
a. Service and models, technologies, and case studies
b. Migration to the cloud, cloudonomics, challenges and issues
c. Non-functional aspects of the cloud
2- Big Data Platforms-as-a-Service
3- Microservice Architecture Fundamentals
a. Overview and basic concepts
b. Microservice migration and orchestration
4- Microservices and Big Data: Model-Based Big Data Analytics-as-a-Service5- Technical prelimininaries
6- HDFS, MapReduce algorithms, Spark
7- Link analysis
8- Finding similar items
9- Frequent itemsets
10- Recommendation systems
Teaching methods
Frontal lessons.
Teaching Resources
Web site:
https://sforestiamdcdc.ariel.ctu.unimi.it/
Slides and reading lists made available on the course web site.
Textbook:
* Anand Rajaraman and Jeff Ullman, Mining of Massive Datasets, Cambridge University Press (ISBN:9781107015357).
Suggested readings:
* Holden Karau, Andy Konwinski, Patrick Wendell, Matei Zaharia, Learning Spark. Lightning-Fast Big Data Analysis, O'Reilly, 2015 (ISBN:978-1-449-35862-4)
* Sandy Ryza, Uri Laserson, Sean Owen, Josh Wills, Advanced Analytics with Spark. Patterns for Learning from Data at Scale, O'Reilly, 2015 (ISBN:978-1-491-91276-8)
https://sforestiamdcdc.ariel.ctu.unimi.it/
Slides and reading lists made available on the course web site.
Textbook:
* Anand Rajaraman and Jeff Ullman, Mining of Massive Datasets, Cambridge University Press (ISBN:9781107015357).
Suggested readings:
* Holden Karau, Andy Konwinski, Patrick Wendell, Matei Zaharia, Learning Spark. Lightning-Fast Big Data Analysis, O'Reilly, 2015 (ISBN:978-1-449-35862-4)
* Sandy Ryza, Uri Laserson, Sean Owen, Josh Wills, Advanced Analytics with Spark. Patterns for Learning from Data at Scale, O'Reilly, 2015 (ISBN:978-1-491-91276-8)
Module Security for Cloud Computing
Course syllabus
1- Introduction to security and privacy
2- Authentication and access control
3- Macrodata and microdata protection
4- Privacy in data publication
5- Data protection in emerging scenarios
6- Data confidentiality and integrity in the cloud
7- Access confidentiality and integrity in the cloud
2- Authentication and access control
3- Macrodata and microdata protection
4- Privacy in data publication
5- Data protection in emerging scenarios
6- Data confidentiality and integrity in the cloud
7- Access confidentiality and integrity in the cloud
Teaching methods
Frontal lessons.
Teaching Resources
Web site:
https://sforestiamdcdc.ariel.ctu.unimi.it/
Slides and reading lists made available on the course web site.
https://sforestiamdcdc.ariel.ctu.unimi.it/
Slides and reading lists made available on the course web site.
Module Cloud Computing and Algorithms for Massive Data
INF/01 - INFORMATICS - University credits: 6
Lessons: 40 hours
Professors:
Bodini Matteo, Malchiodi Dario
Module Security for Cloud Computing
INF/01 - INFORMATICS - University credits: 6
Lessons: 40 hours
Professor:
Foresti Sara
Educational website(s)
Professor(s)
Reception:
To be agreed by scheduling an appointment
Room 37 (3rd floor) or Microsoft Teams