Omitted and included variable bias in tests for disparate impact
Policymakers often seek to gauge discrimination against groups defined by race, gender, and other protected attributes. A common strategy is to estimate disparities after controlling for observed covariates in a regression model. However, not all relevant factors may be available to researchers, leading to omitted variable bias. Conversely, controlling for all available factors may also skew results, leading to so-called "included variable bias". We introduce a simple strategy, which we call risk-adjusted regression, that addresses both concerns in settings where decision makers have clear and measurable policy objectives. First, we use all available covariates to estimate the expected utility of possible decisions. Second, we measure disparities after controlling for these utility estimates alone, omitting other factors. Finally, we examine the sensitivity of results to unmeasured confounding. We demonstrate this method on a detailed dataset of 2.2 million police stops of pedestrians in New York City.
This is joint work with Sam Corbett-Davies, Jongbin Jung, and Sharad Goel.