How to do regression discontinuity when data come in clusters — and when standard errors can fail
This paper studies a common problem in applied research: regression discontinuity (RD) studies often collect data in clusters, but the usual statistical theory assumes independent observations. The authors introduce a general model for clustered RD designs. They show when the standard local linear RD estimator (a common way to estimate the jump at a cutoff) still behaves like a normal statistic in large samples. They also propose a new variance estimator designed for clustered settings.
The researchers model data as groups of units (clusters) that are independent across groups but may be dependent within each group. The outcome for an observation is written as Y = µ(X) + ε, where X is the running variable that determines treatment at a known cutoff, µ(x) is the conditional mean, and ε captures within-cluster dependence. The parameter of interest is the jump in µ at the cutoff, τ = µ(0+) − µ(0−). The commonly used estimator is the local linear RD estimator, which is a weighted average of outcomes near the cutoff. The authors derive high-level conditions on those weights that translate into restrictions on how many units from each cluster fall into the estimation window. They check these conditions across several stylized asymptotic frameworks that capture different patterns of cluster size, dependence of the running variable within clusters, and the within-cluster covariance of the outcome.
On the question of standard errors, the paper shows a mixed picture. The usual cluster-robust, regression-residual-based standard errors are consistent under the same cluster-size conditions that guarantee the estimator’s asymptotic normality. But in many finite-sample situations those standard errors can be either inconsistent or overly conservative. The paper points out that a nearest-neighbor style standard error that works for independent data does not carry over naively to clustered data. To fix this, the authors propose a clustered nearest-neighbor (CNN) standard error. The CNN estimator chooses neighbors while accounting for the clustering and uses the independence across clusters to estimate variance. They prove consistency of the CNN estimator under their high-level assumptions and illustrate its behavior in several empirical applications.