Statistics and Data Analysis
A.Y. 2024/2025
Learning objectives
The course aim at introducing the fundamentals of descriptive statistics, probability and parametric inferential statistics.
Expected learning outcomes
Students will be able to carry out basic explorative analyses and inferences on datasets, they will know the main probability distributions and will be able to understand statistical analyses conducted by others; moreover, they will know simple methods for the problem of binary classification, and will be able to evaluate their performances. The students will also acquire the fundamental competences for studying more sophisticated techniques for data analysis and data modeling.
Lesson period: Second semester
Assessment methods: Esame
Assessment result: voto verbalizzato in trentesimi
Single course
This course can be attended as a single course.
Course syllabus and organization
Single session
Responsible
Lesson period
Second semester
Course syllabus
Introduction to python.
Descriptive statistics:
- Frequencies and cumulate frequencies. Joined and marginal frequencies.
- Indices of centrality, dispersion, correlation, heterogeneity, and concentration.
- Graphical methods: frequency and cumulative frequency plots, scatter plots, and QQ plots.
- Classificators and ROC curves.
Probability:
- Combinatorics. Basics of set theory.
- Probability axioms.
- Conditional probability and related theorems.
- Discrete and continuous random variables. Centrality and dispersion indices for random variables and their properties.
- Multivariate random variables. Covariance and correlation indices for random variables.
- Independent events and independent random variables.
- Markov and Tchebyshev inequalities.
- Bernoulli, binomial, geometric, Poisson, discrete uniform and hypergeometric models.
- Continuous uniform, exponential and gaussian models.
- Poisson process.
Parametric inferential statistics:
- Population, random sample and point estimates.
- Sample mean. Central limit theorem.
- Sample variance.
- Unbiasedness and Consistency in mean square.
- Methods for estimation determination.
- Large numbers law.
- Computation of the sample size.
Descriptive statistics:
- Frequencies and cumulate frequencies. Joined and marginal frequencies.
- Indices of centrality, dispersion, correlation, heterogeneity, and concentration.
- Graphical methods: frequency and cumulative frequency plots, scatter plots, and QQ plots.
- Classificators and ROC curves.
Probability:
- Combinatorics. Basics of set theory.
- Probability axioms.
- Conditional probability and related theorems.
- Discrete and continuous random variables. Centrality and dispersion indices for random variables and their properties.
- Multivariate random variables. Covariance and correlation indices for random variables.
- Independent events and independent random variables.
- Markov and Tchebyshev inequalities.
- Bernoulli, binomial, geometric, Poisson, discrete uniform and hypergeometric models.
- Continuous uniform, exponential and gaussian models.
- Poisson process.
Parametric inferential statistics:
- Population, random sample and point estimates.
- Sample mean. Central limit theorem.
- Sample variance.
- Unbiasedness and Consistency in mean square.
- Methods for estimation determination.
- Large numbers law.
- Computation of the sample size.
Prerequisites for admission
Students shall have passed the exam of "Matematica del continuo" (calculus); besides that, the course requires knowledge of the main topics of computer programming, and having passed the exam of "Matematica del discreto" (discrete mathematics) is strongly suggested.
Teaching methods
Frontal classes and exercise sessions. Lecture attendance is strongly advised.
Teaching Resources
Suggested textbooks:
- S. Ross, Introductory statistics, Academic Press, 2010, ISBN 9788838786020
- S. Ross, Introduction to Probability and Statistics for Engineers and Scientists, 5th edition, Academic Press, 2014, ISBN 9780123743886
Lecture notes (for topics not covered in the suggested textbooks) and sample code available at the course Web pages:
- https://labonline.ctu.unimi.it/
- https://malchiodi.di.unimi.it/teaching/SAD/
- S. Ross, Introductory statistics, Academic Press, 2010, ISBN 9788838786020
- S. Ross, Introduction to Probability and Statistics for Engineers and Scientists, 5th edition, Academic Press, 2014, ISBN 9780123743886
Lecture notes (for topics not covered in the suggested textbooks) and sample code available at the course Web pages:
- https://labonline.ctu.unimi.it/
- https://malchiodi.di.unimi.it/teaching/SAD/
Assessment methods and Criteria
The exam consists of a written and an oral test, both related to the topics covered in the course. The written test, lasting two hours and a half, is based on open-ended questions and on the analysis of a dataset through the adequate application of the statistical techniques described during the classes. The evaluation, with a mark of pass/fail sent via mail to students, takes into account the level of mastery of the topics and the correct use of mathematical formalism.
The oral test, which is accessed after passing the written test, is based on the discussion of the written test answers and on questions concerning topics covered in the course. Its evaluation, expressed on a scale between 0 and 30, takes into account the level of mastery of the topics, the clarity, the language skills, and the correct use of technical jargon.
The oral test, which is accessed after passing the written test, is based on the discussion of the written test answers and on questions concerning topics covered in the course. Its evaluation, expressed on a scale between 0 and 30, takes into account the level of mastery of the topics, the clarity, the language skills, and the correct use of technical jargon.
INF/01 - INFORMATICS - University credits: 6
Practicals: 36 hours
Lessons: 24 hours
Lessons: 24 hours
Professor:
Malchiodi Dario
Shifts:
Turno
Professor:
Malchiodi DarioProfessor(s)