Good to Great Speaker Series

Organized by Assistant Professor Ju Yeon (Julia) Park

Fall 2024

October 2, 2024, 11:15am-12:30pm, Derby 2130: Naoki Egami, "Using Large Language Model Annotations for the Social Sciences: A General Framework of Using Predicted Variables in Statistical Analyses"

Dr. Naoki Egami, an Assistant Professor at the Department of Political Science at Columbia University, will give us a talk on how to use Large Language Models for a data annotation task while avoiding any bias for the downstream analysis. Email: naoki.egami@Columbia.Edu. Professional Website: naokiegami.com

Abstract: Social scientists use automated annotation methods, such as supervised machine learn-ing and, more recently, large language models (LLMs), that can predict labels and generate text-based variables. While such predicted text-based variables are often analyzed as if they were observed without errors, we show that ignoring prediction errors in the automated annotation step leads to substantial bias and invalid confidence intervals in downstream analyses, even if the accuracy of the automated annotations is high, e.g., above 90%. We propose a framework of design-based supervised learning (DSL) that can provide valid statistical estimates, even when predicted variables contain non-random pre- diction errors. DSL employs a doubly robust procedure to combine predicted labels and a smaller number of expert annotations. DSL allows scholars to apply advances in LLMs to social science research while maintaining statistical validity. We illustrate its general applicability using two applications where the outcome and independent variables are text-based.

Spring 2025

Spring 2025 (date TBD), Bruce A. Desmarais

Bruce A. Desmarais (he/him), is the DeGrandis-McCourtney Early Career Professor in Political Science, Director of the Center for Social Data Analytics, and Co-Hire for the Institute for Computational and Data Sciences at Pennsylvania State University. Professional Website: brucedesmarais.com

Past Speaker - Spring 2024

April 16, 2024: Michelle Torres, "Beyond Prediction: Identifying Latent Treatments in Images"

Michelle Torres, an assistant professor at UCLA, is a political methodologist with expertise in image analysis using computer vision and machine learning techniques. Her research classifies political visual messages/frames to understand their role in the generation and processing of political information. She will be presenting her recent working paper: "Beyond Prediction: Identifying Latent Treatments in Images."

Abstract: Images are a rich and crucial element of political communication. The complexity of the information they convey creates challenges for the identification, interpretation, and explanation of the effects of visual messages on information processing and attitude formation. In this article, we adapt a methodological approach used in text analysis, the supervised Indian Buffet Process (sIBP) developed by Fong and Grimmer (F&G, 2016, 2021), to identify latent treatments in images and evaluate their impact on outcomes of interest. First, we use a convolutional neural network (CNN) to decompose images into substantively meaningful and interpretable tokens, visual words, to then form the input of the sIBP. Then, we follow the framework introduced by F&G and demonstrate the utility of this approach using two datasets: 1) a novel experiment measuring attitudes towards climate change in response to visual frames and 2) images of the Black Lives Matter (BLM) movement protests manually labeled by human coders according to the level of conflict they depict. We find significant differences between demographic groups in the way they perceive images, and also unmask latent treatments that confound the relationship between our treatment and outcomes of interest. Importantly, this paper extends the usage of computer vision tools in social sciences beyond prediction of image labels to uncovering, understanding, and visualizing the features of images that produce outcomes.