Big data analytics
A.A. 2024/2025
Obiettivi formativi
The recent years have witnessed a dramatic increase in the interest towards the analysis of texts in social sciences. This is largely due to the development of new methods that facilitate substantively important inferences about politics and society from large text collections. This course aims to provide an introductory guide to this exciting new area of research, while also offering guidelines on how to effectively use text methods for social scientific research.
Risultati apprendimento attesi
By the end of the course students will learn how - effectively use statistical methods on texts for social scientific research; - discuss the advantages, but also the limits, of each approach. Course evaluation aims to verify the expected learning outcomes in relation to these topics.
Periodo: Primo trimestre
Modalità di valutazione: Esame
Giudizio di valutazione: voto verbalizzato in trentesimi
Corso singolo
Questo insegnamento può essere seguito come corso singolo.
Programma e organizzazione didattica
Edizione unica
Responsabile
Periodo
Primo trimestre
Programma
Big data are those labeled, for strange reasons, with the capitalized "Big". Nevertheless, they are still "data", altough with some specific characteristics: large volume, high frequency and, most notably, unpredictability - data come in the many different forms, they are raw, messy, unstructured, not ready for processing, and so on. Still, these data convey a lot of information to social scientists and statistical techniques are required in order to extract meaningful results from them. In this course we will focus on a specific type of big data, namely digital texts, both from social media as well as other sources (such as legislative speeches or electoral programs). The aim is to provide an introductory guide to this exciting new area of research, while also offering guidelines on how to effectively use statistical methods on texts for social scientific research by discussing the advantages, but also the limits, of each approach. The attention will be devoted to five main areas: 1) (supervised and unsupervised) scaling methods that allow to estimate the location of actors along some latent space; 2) supervised classification methods, including machine learning algorithms, that allow to organize texts into a set of pre-defined categories; 3) unsupervised classification that allow to discover new ways of organizing texts into a set of unknown categories; 4) semi-supervised classification methods; 5) word-embedding techniques.
Prerequisiti
An elementary knowledge of R, plus a curiosity towards applied statistics, are good prerequisites for the lab sessions.
Metodi didattici
Lab sessions are a crucial part of the course: they are offered for "hands-on" experiences to learn the techniques and the statistical methods discussed during classes. All the datasets, replication files of the lab sessions and reference texts will be made available at a dedicated URL before the beginning of the course. Enrolled students should bring their own laptop with R, RStudio and the relevant packages previously installed and functioning (instructions will be circulated beforehand).
Materiale di riferimento
Benoit, Kenneth. 2020. Text as Data: An Overview. In: Luigi Curini and Robert Franzese, Sage Handbook of Research Methods in Political Science and International Relations, London: Sage, 461-497
Grimmer, Justin, and Stewart, Brandon M. 2013. Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts. Political Analysis, 21(3): 267-297.
Laver, Michael, Kenneth Benoit, John Garry. 2003. Extracting Policy Positions from political texts using words as data. American Political Science Review, 97(02), 311-331
Proksch, Sven-Oliber, and Slapin, Jonathan B. 2008. A Scaling Model for Estimating Time-Series Party Positions from Texts. American Journal of Political Science, 52(3): 705-722.
Robert, Margaret E., Brandon M. Stewart, Dustin Tingley, Christopher Luca, Jetson Leder-Luis, Shana Kushner Gadarian, Bethany Albertson, David G. Rand. 2014. Structural Topic Models for Open-Ended Survey Response, American Journal of Political Science, 58(4), 1064-1082
Shusei Eshima, Kosuke Imai, and Tomoya Sasaki (2023). Keyword Assisted Topic Models, American Journal of Political Science, DOI: 10.1111/ajps.12779
Curini, Luigi, and Robert Fahey. 2020. Sentiment Analysis. In: Luigi Curini and Robert Franzese, Sage Handbook of Research Methods in Political Science and International Relations, London: Sage, 534-551
Barberá, Pablo and C. Steinert-Threlkeld Zachary. How to Use Social Media Data for Political Science Research. In: Luigi Curini and Robert Franzese, Sage Handbook of Research Methods in Political Science and International Relations, London: Sage, 404-423
Further readings will be suggested during the course. Please check regularly the home-page of the course and contact the professor for further questions.
Grimmer, Justin, and Stewart, Brandon M. 2013. Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts. Political Analysis, 21(3): 267-297.
Laver, Michael, Kenneth Benoit, John Garry. 2003. Extracting Policy Positions from political texts using words as data. American Political Science Review, 97(02), 311-331
Proksch, Sven-Oliber, and Slapin, Jonathan B. 2008. A Scaling Model for Estimating Time-Series Party Positions from Texts. American Journal of Political Science, 52(3): 705-722.
Robert, Margaret E., Brandon M. Stewart, Dustin Tingley, Christopher Luca, Jetson Leder-Luis, Shana Kushner Gadarian, Bethany Albertson, David G. Rand. 2014. Structural Topic Models for Open-Ended Survey Response, American Journal of Political Science, 58(4), 1064-1082
Shusei Eshima, Kosuke Imai, and Tomoya Sasaki (2023). Keyword Assisted Topic Models, American Journal of Political Science, DOI: 10.1111/ajps.12779
Curini, Luigi, and Robert Fahey. 2020. Sentiment Analysis. In: Luigi Curini and Robert Franzese, Sage Handbook of Research Methods in Political Science and International Relations, London: Sage, 534-551
Barberá, Pablo and C. Steinert-Threlkeld Zachary. How to Use Social Media Data for Political Science Research. In: Luigi Curini and Robert Franzese, Sage Handbook of Research Methods in Political Science and International Relations, London: Sage, 404-423
Further readings will be suggested during the course. Please check regularly the home-page of the course and contact the professor for further questions.
Modalità di verifica dell’apprendimento e criteri di valutazione
For enrolled students, course grades will be based on home-assignments and class-participation. Instructions for not-enrolled students will be circulated later.
SPS/04 - SCIENZA POLITICA - CFU: 6
Lezioni: 40 ore
Docente:
Curini Luigi
Turni:
Turno
Docente:
Curini LuigiDocente/i
Ricevimento:
Mercoledì 12:45-15:45
stanza 319 - via Conservatorio 7, Dipartimento di Scienze sociali e politiche. E' consigliato scrivere al docente per fissare un appuntamento