Statistics for Big Data for Economics and Business | Università degli Studi di Milano Statale

A.Y. 2021/2022

Max ECTS

Overall hours

SSD

SECS-S/03

Language

Italian

Included in the following degree programmes

Economics and Management (Classi L-18/L-33)-Enrolled from 2018/2019 Academic Year

Learning objectives

This course aims at introducing and illustrating specific statistical, IT and machine learning methodologies for the analysis of Big Data in economic, business and financial applications. The course will focus mainly on the Python programming language, which is by far the most used in Big Data applications, but some parts will be devoted to the R language and other more classical languages such as Java. On the statistical side, supervised and unsupervised statistical learning themes will be proposed with some reference to Bayesian statistics.

Expected learning outcomes

At the end of the course, students will have acquired adequate statistical and programming skills allowing for mastering the tools necessary for the analysis of Big Data and the extrapolation of information of interest in the economic, business and financial fields.

Lesson period: Third trimester

Lessons timetable

Assessment methods: Esame
Assessment result: voto verbalizzato in trentesimi

Exams calendar

Single course

This course cannot be attended as a single course. Please check our list of single courses to find the ones available for enrolment.

Search a single course

Course syllabus and organization

Single session

Lesson period

Third trimester

Emergency remote teaching

Teaching methods.
Classes will be held both in presence and on the Microsoft Teams platform in synchronous mode.

Syllabus and reference material.
The syllabus and the reference material will not change in case classes will return to be held "in person".

Verification of learning and evaluation criteria.
The exam will take place in presence with a multiple choice test.

The exam, in particular, will be aimed at:
- ensure the achievement of objectives in terms of knowledge and understanding;
- ascertain the ability to apply knowledge and understanding through the discussion of specific cases in which topics of the course will be applied;
- verify the student's autonomy in developing their own attitudes on the topics of the course.

Syllabus

Course syllabus

FIRST PART :
1) DATA SCIENCE TECHNIQUES: supervised models
1.1 Multiple linear regression model
1.2 Generalized linear models (logit, probit and tobit)
2) DATA SCIENCE TECHNIQUES: unsupervised models
2.1 cluster analysis
2.2 principal components analysis
2.3 cross-validation
2.4 text mining
SECOND PART :
1) Introduction to programming in R and Python for statistical and economic applications
2) Introduction to cloud computing
3) Introduction to web scraping
4) Introduction to relational and non-relational databases
5) Introduction to Hadoop for big data processing

Prerequisites for admission

Knowledge of basic statistical and mathematical techniques. Knowledge of some programming techniques is useful but not essential.

Teaching methods

Classes will be carried out with the active involvement of the students, especially in the programming part. They will often be invited to actively follow (i.e. also on their personal laptops) steps of computer programs proposed in the classroom together with the teacher, in a "what-if" approach. They will also work in gro ups to share and increase the effectiveness of their active learning.

Teaching Resources

James, Witten, Hastie, Tibshirani (2013). Introduction to Statistical Learning, Springer
Wiktorski, (2019). Data-intensive Systems, Springer.
Sosinsky (2010). Cloud Computing Bible, Wiley
Raschka, Mirjalili (2013). Python Machine Learning

Assessment methods and Criteria

The exam will consist of a test with questions involving multiple answers. During the course some assignments will be proposed (both in the classroom and to be returned in the short term) which will contribute to the final score.

Course structure

SECS-S/03 - ECONOMIC STATISTICS - University credits: 6

Lessons: 40 hours

Professor: Manzi Giancarlo