Health Informatics

A.Y. 2023/2024
3
Max ECTS
36
Overall hours
SSD
INF/01
Language
English
Learning objectives
Health Informatics course is divided in two modules dealing with the use of software for basic data analysis. In the first module Excel functions will be used to illustrate basic data analysis functionalities while in the second module, the Python programming language will be used.
Expected learning outcomes
At the end of the first module the student will be able to use Excel data analysis toolpack and pivot tables for statistical calculations. At the end of the second module the students will be able to use a full programming language for basic data manipulation and statistical analysis.
Single course

This course cannot be attended as a single course. Please check our list of single courses to find the ones available for enrolment.

Course syllabus and organization

Single session

Responsible
Course syllabus
The course will address the following topics in 3 teaching modules:

Course Introduction
Research problem classification
Data sources and data types
Compute data statistics
Data summarization
Graphical Programming Tool: Orange
Data Preprocessing
Supervised/Unsupervised Learning
General modelling concepts and workflow
Classification Trees
Logistic Regression
Model Interpretation
(Linear Regression, Clustering)



MODULE 1: DATA EXPLORATION AND DESCRIPTIVE ANALYSIS
The first part will be very strictly connected with the biostatistics course and will involve using excel for presenting patient data and making sta-tistical comparisons.
· Research problems categorization
· Structured, Semi-structured data source
Tables: Records and features
Data Types (nominal, ordinal, ranked, discrete, continuous)
· Exploratory Data Analysis
Statistics, query, summarization
Graphical representation
References for this part are (numbers refer to BIBLIOGRAPHY section):
· Spreadsheets: a few tips (Ariel platform)
· [4] Chapter 2,3
· [1] Chapter 1
· [3] Chapter 2,3

MODULE 2: DATA PREPROCESSING FOR PREDICTIVE MODELS
Graphical Programming Tool: Orange
· Installation of the tool and interface exploration
· Simple use cases
Data Preprocessing
· Outliers, Missing Values, Data Representation, Standardization, Discretization, Feature Engi-neering, (Unbalance data)
References for this module are:
· [4] Chapter 2, 4

MODULE 3: INTRODUCTION TO STATISTICAL LEARNING FOR PRE-DICTIVE MODELS
Supervised/Unsupervised Learning
· Classification, Regression, Clustering
Model Workflow
· Bias/Variance and overfitting, Holdout method
Supervised Learning
· Feature selection
· Classification: Classification Tree, Logistic Regression
· (Regression: Linear Regression)
(Unsupervised Learning)
· Dimensionality Reduction: PCA, Data embedding
· Clustering
References for this module are:
· [4] Chapter 19, 20
· [5] Chapter 2, 3.1-3.3, 4.1-4.3, 5, 8, (12)
Prerequisites for admission
To take the Health Informatics exam, students must have already passed all the exams of the first year (Fundamentals of Basic Sciences, Cells Molecules and Genes 1 and 2, Human Body) and the exam of Functions.

· Good knowledge of Excel.
Suggested material: https://support.office.com/en-us/article/introduction-to-excel-starter601794a9-b73d-4d04-b2d4-eed4c40f98be
· Basic knowledge in analysis and statistical
Teaching methods
Synchronous learning: Lectures by the teachers will mainly be used through the course.
Asynchronous learning: Literature data will be provided to exercise in data analysis with software.

ATTENDANCE:
Attendance is required to be allowed to take the exam. Unexcused absence is tolerated up to 34% of the course activities. University policy regarding excused illness is followed.
Teaching Resources
Bibliography
1. Leslie E. Daly and Geoffrey J. Bourke, "Interpretation and Uses of Medical Statistics", 5th edition. (Available online in Unimi library)
2. J. Mark Elwood, "Critical Appraisal of Epidemiological Studies and Clinical Trials", 3rd Edi-tion, Oxford University Press
3. Douglas G. Altman, "Practical statistics for medical research". Chapman and Hall
4. Marcello Pagano, Kimberlee Gauvreau, "Principles of Biostatistics", 2000, Duxbury Press. (Available online in Unimi library)
5. Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani. "An Introduction to Statistical Learning: with Applications in R". New York : Springer, 2013. (Available online)

Software
· Microsoft Excel
· Orange Data Mining https://orangedatamining.com/
Assessment methods and Criteria
The exam consists of a written test containing both theoretical and practical questions.
The written test will be based on the Moodle platform basically with multiple items questions and short answers numerical questions. Online statistical calculators will be needed to answer some of the numerical questions. The grades are on a scale of 30 and a minimum of 18/30 is required to pass the written test.

Attendance is required to be allowed to take the exam. Unexcused absence is tolerated up to 34% of the course activities. University policy regarding excused illness is followed.
Registration to the exam through SIFA is mandatory.
INF/01 - INFORMATICS - University credits: 3
Lessons: 24 hours
: 12 hours
Educational website(s)
Professor(s)
Reception:
On appointment (email)
Laboratorio di Statistica Medica, Biometria ed Epidemiologia "G.A. Maccacaro", Via Celoria 22, Milano