Statistics for Big Data
A.Y. 2018/2019
Learning objectives
Il corso si propone di introdurre ed illustrare specifiche metodologie statistiche, informatiche e di data mining per l'analisi di Big Data. L'implementazione di tali tecniche avverrà mediante l'impiego del software statistico R. Al termine del corso, lo studente dovrà aver acquisito adeguate competenze statistiche e di programmazione che gli consentano di padroneggiare gli strumenti statistici ed informatici necessari per l'analisi dei dati e l'estrapolazione delle informazioni di interesse derivante dai dati stessi.
Expected learning outcomes
Undefined
Lesson period: Third trimester
Assessment methods: Esame
Assessment result: voto verbalizzato in trentesimi
Single course
This course cannot be attended as a single course. Please check our list of single courses to find the ones available for enrolment.
Course syllabus and organization
Single session
Lesson period
Third trimester
ATTENDING STUDENTS
Course syllabus
NON-ATTENDING STUDENTS
The course will be organized according to the following topics:
FIRST PART :
1) DATA MINING TECHNIQUES 1: supervised models
1.1 generalized linear models (logit, probit and tobit)
1.2 multilevel models
2) DATA MINING 2 TECHNIQUES: unsupervised models
2.1 cluster analysis
2.2 principal component analysis
2.3 factor analysis
2.4 cross-validation
2.5 text mining
SECOND PART :
1) Introduction to programming in R and Python
2) Data mash up techniques
3) Cloud computing techniques
4) Web scraping techniques
5) Interaction with relational and non-relational databases
6) Big data analytics
FIRST PART :
1) DATA MINING TECHNIQUES 1: supervised models
1.1 generalized linear models (logit, probit and tobit)
1.2 multilevel models
2) DATA MINING 2 TECHNIQUES: unsupervised models
2.1 cluster analysis
2.2 principal component analysis
2.3 factor analysis
2.4 cross-validation
2.5 text mining
SECOND PART :
1) Introduction to programming in R and Python
2) Data mash up techniques
3) Cloud computing techniques
4) Web scraping techniques
5) Interaction with relational and non-relational databases
6) Big data analytics
Course syllabus
The course will be organized according to the following topics:
FIRST PART :
1) DATA MINING TECHNIQUES 1: supervised models
1.1 generalized linear models (logit, probit and tobit)
1.2 multilevel models
2) DATA MINING 2 TECHNIQUES: unsupervised models
2.1 cluster analysis
2.2 principal component analysis
2.3 factor analysis
2.4 cross-validation
2.5 text mining
SECOND PART :
1) Introduction to programming in R and Python
2) Data mash up techniques
3) Cloud computing techniques
4) Web scraping techniques
5) Interaction with relational and non-relational databases
6) Big data analytics
FIRST PART :
1) DATA MINING TECHNIQUES 1: supervised models
1.1 generalized linear models (logit, probit and tobit)
1.2 multilevel models
2) DATA MINING 2 TECHNIQUES: unsupervised models
2.1 cluster analysis
2.2 principal component analysis
2.3 factor analysis
2.4 cross-validation
2.5 text mining
SECOND PART :
1) Introduction to programming in R and Python
2) Data mash up techniques
3) Cloud computing techniques
4) Web scraping techniques
5) Interaction with relational and non-relational databases
6) Big data analytics
SECS-S/01 - STATISTICS - University credits: 6
Lessons: 40 hours
Professor:
Manzi Giancarlo