Databases and exposure scenarios
A.A. 2024/2025
Obiettivi formativi
The course is organized into two parts, namely "Informatics and databases" and "Statistics applied to epidemiology".
Informatics and databases part of the course aims at providing the basic concepts of database and database management systems, with focus on relational data modeling and SQL query language. To develop a deeper understanding of the relational data organization in real contexts, examples relational data schemas of biological databases and SQL queries to extract data from them are presented and discussed.
As regards the statistical knowledge, the course aims at providing the fundamental concepts of descriptive and inferential statistics and epidemiological study design. The course also provides concrete tools to apply the main statistical techniques to real cases. At the end of the course the student should demonstrate knowledge and understanding of the main statistical techniques for the description and analysis of the phenomena being studied and the basic principles for setting up an epidemiological study; should have the ability to apply the knowledge acquired and the ability to interpret the results of the statistical analyses; should develop the skills necessary to continue studies independently in the context of statistical analysis and epidemiology.
Informatics and databases part of the course aims at providing the basic concepts of database and database management systems, with focus on relational data modeling and SQL query language. To develop a deeper understanding of the relational data organization in real contexts, examples relational data schemas of biological databases and SQL queries to extract data from them are presented and discussed.
As regards the statistical knowledge, the course aims at providing the fundamental concepts of descriptive and inferential statistics and epidemiological study design. The course also provides concrete tools to apply the main statistical techniques to real cases. At the end of the course the student should demonstrate knowledge and understanding of the main statistical techniques for the description and analysis of the phenomena being studied and the basic principles for setting up an epidemiological study; should have the ability to apply the knowledge acquired and the ability to interpret the results of the statistical analyses; should develop the skills necessary to continue studies independently in the context of statistical analysis and epidemiology.
Risultati apprendimento attesi
Regarding informatics and databases, students are expected to be able to understand relational database schemas and languages and to describe the meaning, the properties, the relationships, and the constraints featuring data stored in a database. Students will be able to apply concepts, models, and languages introduced in the course to formulate SQL queries over a database schema, with appropriate conditions to filter and retrieve target data satisfying specific user needs, also referring to real biological databases.
As regards the statistical knowledge, students will be expected to have assimilated the concepts exposed in the teaching, knowing how to critically compare the use of different statistical tests and study designs. In addition, students will develop the basic skills necessary to design an epidemiological study and face its statistical analysis.
As regards the statistical knowledge, students will be expected to have assimilated the concepts exposed in the teaching, knowing how to critically compare the use of different statistical tests and study designs. In addition, students will develop the basic skills necessary to design an epidemiological study and face its statistical analysis.
Periodo: Primo semestre
Modalità di valutazione: Esame
Giudizio di valutazione: voto verbalizzato in trentesimi
Corso singolo
Questo insegnamento può essere seguito come corso singolo.
Programma e organizzazione didattica
Edizione unica
Responsabile
Periodo
Primo semestre
Programma
"Informatics and Databases"
Introduction to databases. Information systems, information, and data. Database and Database Management System (DBMS). Data models. Schemas and instances. Abstraction levels in DBMSs. Database languages and users.
Relational databases. The relational model. Relations and tables. Relations with attributes. Relations and databases. Incomplete information and null values. Integrity constraints. Definitions and properties of keys. Primary key and foreign key constraints.
Query languages for relational databases: SQL. Basic SQL query format. Selection and projection queries. Join queries (inner join, natural join, outer joins). Aggregate queries. Group by queries. Set (union, intersection, difference) queries. Nested queries. Correlated nested queries.
Conceptual data modeling with the Entity-Relationship model.
Working with a real database example storing protein-related data. Online access to the database schema and constraints. Interactive formulation of SQL queries over the online database.
"Statistics applied to epidemiology"
Introduction to the course (objectives, final evaluation criteria). All of the following theoretical topics are accompanied by examples.
Collecting data sets: populations and samples. Describing data sets: frequency tables and graphs. Summarizing data sets: mean, median, mode. Outliers. Variability: variance, standard deviation. Describing data sets: sets of paired data, scatter diagram, least squares regression line, slope (linear regression coefficient), intercept. Qualitative and quantitative evaluation of linear regression. The correlation coefficient. Interpreting regression and correlation.
Probability. Definitions: experiment, outcome, sample space, event. Rules of probability and independent events. Conditional probability and Bayes Theorem. Probability distribution, expected value. Binomial, Gaussian, t-Student, Fisher and chi-squared distributions.
Standardizing normally distributed random variables. Population and sample. Population mean and variance. Sample mean, expected value of the sample mean, variance of the sample mean, standard deviation of the sample mean. Central limit theorem.
Epidemiological indicators: prevalence and incidence, absolute and relative risks, odds ratios and hazard ratios. Confidence intervals. Statistical inference, null hypothesis and statistical significance. Student's t test. Chi-squared test, non-parametric tests.
Main sources of bias and their relation with accuracy and precision. True positives, true negatives, false positives, false negatives. Sensitivity, specificity and ROC curve. Graphical representation.
Types of epidemiological studies. Observational studies: cross-sectional, case-control, cohort. Experimental studies: randomized controlled trials, field trials, community trials. Potential limitations in epidemiological studies.
Main criteria for evaluation of scientific studies with special reference to statistical studies applied to epidemiology. Understanding the statistical methodology, the results, and their interpretation with examples from scientific publications.
Review of the whole program with exercises on the board.
Introduction to databases. Information systems, information, and data. Database and Database Management System (DBMS). Data models. Schemas and instances. Abstraction levels in DBMSs. Database languages and users.
Relational databases. The relational model. Relations and tables. Relations with attributes. Relations and databases. Incomplete information and null values. Integrity constraints. Definitions and properties of keys. Primary key and foreign key constraints.
Query languages for relational databases: SQL. Basic SQL query format. Selection and projection queries. Join queries (inner join, natural join, outer joins). Aggregate queries. Group by queries. Set (union, intersection, difference) queries. Nested queries. Correlated nested queries.
Conceptual data modeling with the Entity-Relationship model.
Working with a real database example storing protein-related data. Online access to the database schema and constraints. Interactive formulation of SQL queries over the online database.
"Statistics applied to epidemiology"
Introduction to the course (objectives, final evaluation criteria). All of the following theoretical topics are accompanied by examples.
Collecting data sets: populations and samples. Describing data sets: frequency tables and graphs. Summarizing data sets: mean, median, mode. Outliers. Variability: variance, standard deviation. Describing data sets: sets of paired data, scatter diagram, least squares regression line, slope (linear regression coefficient), intercept. Qualitative and quantitative evaluation of linear regression. The correlation coefficient. Interpreting regression and correlation.
Probability. Definitions: experiment, outcome, sample space, event. Rules of probability and independent events. Conditional probability and Bayes Theorem. Probability distribution, expected value. Binomial, Gaussian, t-Student, Fisher and chi-squared distributions.
Standardizing normally distributed random variables. Population and sample. Population mean and variance. Sample mean, expected value of the sample mean, variance of the sample mean, standard deviation of the sample mean. Central limit theorem.
Epidemiological indicators: prevalence and incidence, absolute and relative risks, odds ratios and hazard ratios. Confidence intervals. Statistical inference, null hypothesis and statistical significance. Student's t test. Chi-squared test, non-parametric tests.
Main sources of bias and their relation with accuracy and precision. True positives, true negatives, false positives, false negatives. Sensitivity, specificity and ROC curve. Graphical representation.
Types of epidemiological studies. Observational studies: cross-sectional, case-control, cohort. Experimental studies: randomized controlled trials, field trials, community trials. Potential limitations in epidemiological studies.
Main criteria for evaluation of scientific studies with special reference to statistical studies applied to epidemiology. Understanding the statistical methodology, the results, and their interpretation with examples from scientific publications.
Review of the whole program with exercises on the board.
Prerequisiti
Students must have knowledge of basic mathematics studied during the three-year degree course.
Metodi didattici
"Informatics and Databases." The teaching consists of lectures, supported by slides and blackboard exercises. Slides, which follow the contents of the lectures, are available on the MyAriel website https://myariel.unimi.it/course/view.php?id=3161.
"Statistics applied to epidemiology." The teaching consists of lectures, supported by slides and blackboard exercises. Slides, which follow the contents of the lectures, are available on MyAriel website https://myariel.unimi.it/course/view.php?id=3314. During the course, paper statistical tables (also available on the MyAriel website https://myariel.unimi.it/course/view.php?id=3314) are distributed so that students can directly follow the analyses presented during the lessons.
"Statistics applied to epidemiology." The teaching consists of lectures, supported by slides and blackboard exercises. Slides, which follow the contents of the lectures, are available on MyAriel website https://myariel.unimi.it/course/view.php?id=3314. During the course, paper statistical tables (also available on the MyAriel website https://myariel.unimi.it/course/view.php?id=3314) are distributed so that students can directly follow the analyses presented during the lessons.
Materiale di riferimento
"Informatics and Databases"
- P. Atzeni, S. Ceri, S. Paraboschi, R. Torlone, Database Systems - Concepts, Languages and Architectures - Mc-Graw Hill, available online at http://dbbook.dia.uniroma3.it/
Chapters: 1 (whole), 2 (whole), 3(until §3.1.6 included)-4 (only § 4.2. and related subparagraphs)-5 (only § 5.2. and related subparagraphs)
- Lecture slides downloadable from the MyAriel website (https://myariel.unimi.it/course/view.php?id=3161).
"Statistics applied to epidemiology"
The teaching material consists of the slides uploaded on MyAriel website and of the following books:
- Barbara Illowsky, Susan Dean (2013), Introductory Statistics by OpenStax. 1st Edition, XanEdu Publishing Inc.
https://openstax.org/details/books/introductory-statistics
- Beaglehole, Robert, Bonita, Ruth, Kjellström, Tord & World Health Organization (1993). Basic epidemiology.Updated reprint, World Health Organization. https://apps.who.int/iris/bitstream/handle/10665/36838/9241544465.pdf?sequence=1&isAllowed=y
- Darrell Huff (1991), How to Lie with Statistics. Penguin (1991). https://www.horace.org/blog/wp-content/uploads/2012/05/How-to-Lie-With-Statistics-1954-Huff.pdf
- P. Atzeni, S. Ceri, S. Paraboschi, R. Torlone, Database Systems - Concepts, Languages and Architectures - Mc-Graw Hill, available online at http://dbbook.dia.uniroma3.it/
Chapters: 1 (whole), 2 (whole), 3(until §3.1.6 included)-4 (only § 4.2. and related subparagraphs)-5 (only § 5.2. and related subparagraphs)
- Lecture slides downloadable from the MyAriel website (https://myariel.unimi.it/course/view.php?id=3161).
"Statistics applied to epidemiology"
The teaching material consists of the slides uploaded on MyAriel website and of the following books:
- Barbara Illowsky, Susan Dean (2013), Introductory Statistics by OpenStax. 1st Edition, XanEdu Publishing Inc.
https://openstax.org/details/books/introductory-statistics
- Beaglehole, Robert, Bonita, Ruth, Kjellström, Tord & World Health Organization (1993). Basic epidemiology.Updated reprint, World Health Organization. https://apps.who.int/iris/bitstream/handle/10665/36838/9241544465.pdf?sequence=1&isAllowed=y
- Darrell Huff (1991), How to Lie with Statistics. Penguin (1991). https://www.horace.org/blog/wp-content/uploads/2012/05/How-to-Lie-With-Statistics-1954-Huff.pdf
Modalità di verifica dell’apprendimento e criteri di valutazione
The course exam consists of two separate exams, one exam for the "Informatics and databases" part of the course and one exam for the "Statistics applied to epidemiology" part of the course. The vote of each part-exam is expressed in thirtieths. The final vote of the course exam is expressed in thirtieths as the average of the two part-exam votes.
"Informatics and Databases".
The exam consists of a single test. No intermediate tests are foreseen. The exam is written (1 hour), it covers all the topics presented during lectures, and it will consist of multiple-choice questions and exercises. The exam aims to verify that the course objectives have been achieved, namely, that students have learned the basic concepts related to relational databases and that they are able to solve query requests using SQL.
The same assessment methods and criteria apply to attending and non-attending students.
"Statistics applied to epidemiology."
The exam consists of a single test. No intermediate tests are planned. The test consists of a written exam (2,5 hours). A paper taken from an international indexed journal will be assigned, containing a study evaluated with statistical methods presented in class. Students will have to answer some open questions regarding the understanding of the statistical methods used in the paper. To pass the exam, the student must demonstrate to:
- understand the concepts of epidemiological study and basic statistics.
- know how to apply the knowledge acquired to real situations
- know how to interpret the results obtained from the analyses carried out.
The same assessment methods and criteria apply to attending and non-attending students.
"Informatics and Databases".
The exam consists of a single test. No intermediate tests are foreseen. The exam is written (1 hour), it covers all the topics presented during lectures, and it will consist of multiple-choice questions and exercises. The exam aims to verify that the course objectives have been achieved, namely, that students have learned the basic concepts related to relational databases and that they are able to solve query requests using SQL.
The same assessment methods and criteria apply to attending and non-attending students.
"Statistics applied to epidemiology."
The exam consists of a single test. No intermediate tests are planned. The test consists of a written exam (2,5 hours). A paper taken from an international indexed journal will be assigned, containing a study evaluated with statistical methods presented in class. Students will have to answer some open questions regarding the understanding of the statistical methods used in the paper. To pass the exam, the student must demonstrate to:
- understand the concepts of epidemiological study and basic statistics.
- know how to apply the knowledge acquired to real situations
- know how to interpret the results obtained from the analyses carried out.
The same assessment methods and criteria apply to attending and non-attending students.
Moduli o unità didattiche
Informatics and Database
INF/01 - INFORMATICA - CFU: 3
Lectures: 24 ore
Docente:
Castano Silvana
Statistics applied to Epidemiology
SECS-S/01 - STATISTICA - CFU: 3
Lectures: 24 ore
Docente:
Adorni Fulvio Daniele
Siti didattici
Docente/i
Ricevimento:
Ricevimento su appuntamento tramite email - Palazzo LITA, via F.lli Cervi 93, Segrate (Milano)
Ricevimento:
Su appuntamento tramite email
Online OR Via Celoria 18 - Stanza 7012