Methods for difference-in-differences studies
Difference-in-differences is a popular method to evaluate policy intervention. In a diff-in-diff study, the change in outcomes of the treated group from before to after an intervention is compared to the contemporaneous change in an untreated comparison group. Causal conclusions rely on the assumption that the comparison group's change provides a valid counterfactual for the treated group. A small literature addresses the statistical and causal properties of difference-in-differences estimators. We contribute to this literature by investigating three questions: 1) What is the impact of matching treated and control units on pre-intervention variables? 2) How useful is a test of parallel outcome trends in the pre-intervention period? 3) What is the definition of confounding in diff-in-diff? and 4) What impact do timeseries and hierarchical variance patterns have on diff-in-diff inference? We find that many applied diff-in-diff studies suffer from hidden biases and poor statistical and causal performance due to inappropriate matching, mis-specified tests, misunderstanding of confounding, and failure to account for variance structures. We argue that more thorough robustness checks are needed to engender confidence in the conclusions of diff-in-diff studies.