In Spring 2024, I gave a tutorial "Toward a taxonomy of trust for probabilistic data analysis." The full materials and errata are on this page.
Abstract: Probabilistic data analysis increasingly informs critical decisions in medicine, economics, education, and beyond. A major concern is generalization: if we conclude that an economic or health intervention helps people based on a data analysis, we hope that it will indeed help people when deployed in the future. We might be concerned about generalization, though, if two analysts could come to different conclusions when trying to answer the same question with data. In this talk, we discuss how such a discrepancy could happen to two well-meaning data analysts, who aren't being targeted by adversaries and who are using standard data analysis tools. In particular, we examine potential challenges and mitigations at multiple steps of a data analysis: (i) in the collection of data, (ii) in the translation of abstract goals on the data to a concrete mathematical problem, (iii) in the use of an algorithm to solve the stated mathematical problem, and (iv) in the use of a particular code implementation of the chosen algorithm.
Video:
Slides:
Errata: