(EDIT: Professor Richard Hahn was kind enough to let me know about his work on RDD, which I included in this post now :) apologies for any mistake, I had to do it using the iPad)
Hi there!
This post is not related to the DiD literature but I saw these new developments in Regression Discontinuity Design and Instrumental Variables and thought about letting you know.
Here’s the list then I will go through them quick, focusing on how and when you can benefit from them:
Optimal Formula Instruments, by Kirill Borusyak and Peter Hull
Donut RDDs, by Claudia Noack and Christoph Rothe
Flexible Covariate Adjustments in RDDs, by Claudia Noack, Tomasz Olma and Christoph Rothe
A Partial Linear Estimator for Small Study RDDs, by Daryl Swartzentruber and Eloise Kaizar
Treatment Effect Heterogeneity in RDDs, by Sebastian Calonico, Matias D. Cattaneo, Max H. Farrell, Filippo Palomba and Rocio Titiunik
Learning Conditional Average Treatment Effects in Regression Discontinuity Designs using Bayesian Additive Regression Trees, by Rafael Alcantara, P. Richard Hahn, Carlos Carvalho, and Hedibert Lopes
Optimal Formula Instruments
TL;DR: this paper presents a method to build better IVs for treatments based on complex formulas (like benefit eligibility). It makes smarter use of individual characteristics, corrects for endogeneity, and leads to much more precise estimates - especially useful in regional/policy studies.
In this paper, profs Kirill and Peter propose a new method to construct more powerful IVs for treatments defined by complex formulas (such as eligibility for public programs, which has to account for income, family status, and what state the person lives in). It builds on and generalizes the widely used simulated instrument (“let’s imagine what would happen to a typical person if this policy were different in their state”, which is somewhat weak because it doesn’t take into account how individuals are affected differently by such policy) approach (e.g., Currie & Gruber, 1996). They introduce what they call "optimal formula instruments" that adjust for heterogeneous shock exposure while ensuring instrument validity through a technique called recentering. These instruments are constructed using observed data rather than relying solely on economic theory or exogenous instruments. It is a “smarter” IV because instead of just simulating how a typical person is affected by policy changes, they:
Predict how each individual is affected by the policy, using their actual characteristics.
Then they adjust (or “recenter”) this prediction so it still qualifies as a valid instrument - i.e., it’s not related to unobserved factors that could mess up the estimate.
Their proposed algorithm that approximates the optimal IV follows a few steps. They mention: “We then propose an algorithm to approximate optimal IVs in practice, focusing on the first two steps: obtaining the best treatment predictor and recentering it. While implementing both steps nonparametrically may be feasible in some settings, in general they represent a high-dimensional problem that may be impractical or infeasible - especially in non-iid data. Instead, we propose using knowledge the researcher has on the treatment formula as well as the “design” (i.e., data-generating process) of the exogenous shocks. First, the researcher predicts the treatment from the shocks and other observables which enter the treatment formula, setting any unobserved or endogenous components of the formula to a base value (such as zero). When there are no unobserved or endogenous components, this prediction is the treatment itself. Second, the researcher recenters this prediction by drawing counterfactual sets of exogenous shocks, following Borusyak and Hull (2023). Optionally residualizing the recentered prediction on covariates yields an approximation to the optimal instrument, up to the heteroskedasticity adjustment that is not popular in practice.”
In summary, these are the steps involved:
Formulating a best guess of treatment from shocks and observables.
A recentering step to ensure exogeneity.
Optional residualization and weighting for efficiency.
In their empirical application, they use this new IV to study how expanding Medicaid eligibility in 2014 (under the ACA) affected private insurance coverage. Their method gives more precise estimates (smaller standard errors) and also reveals that most of the crowd-out effect came from people switching away from direct-purchase insurance (e.g., ACA marketplaces) rather than from employer-sponsored plans - which has important implications for how we think about labour market effects.
This approach doesn’t require any fancy ML or black-box models - just a smart use of the information you already have in your dataset and knowledge of how the treatment formula works. It’s especially helpful when treatments depend on multiple factors (e.g., policy formulas) and when the data aren’t independent and identically distributed (non-iid) - as is common in regional or policy data.
What they do and how it’s different:
Traditional simulated IVs are like saying, “On average, this policy is more generous in State A than in State B, so let’s use that average difference to identify effects.” But Optimal Formula Instruments say: “Let’s be more precise and ask how this exact person, with their specific income and family structure, would be affected if they lived in a different state with different policies.”
This shift matters because many treatments in economics aren’t one-size-fits-all: the same policy can affect people in vastly different ways depending on their characteristics. So building a more personalized instrument (and then recentring it to correct for bias) makes use of this extra information while still satisfying the IV assumptions.
Why is this important?
Instrument strength is a major limitation in many IV applications. This approach leverages additional variation (heterogeneous exposure) without compromising identification.
It provides a formal justification and practical path to improve on standard shift-share or simulated instruments used widely in applied microeconomics.
The method remains valid and powerful even when data are non-iid or high-dimensional, expanding its applicability.
Who should care?
Applied microeconomists using IVs in settings with complex treatment definitions (e.g., eligibility, benefits formulas).
Researchers using simulated instruments, shift-share instruments, or working with non-iid data (e.g., regional economics, policy evaluation).
Methodologists interested in optimal IV construction or semi-parametric efficiency.
Do they have the code/package? I haven’t seen it. While the authors outline a clear algorithm and provide simulation results in the paper, there is no official GitHub repository currently available for implementing the "optimal formula instruments" method. However, Kirill has shared code for related shift-share IV work on GitHub (https://github.com/borusyak), which may be helpful. Researchers can follow the detailed steps in the paper to build their own implementation.
How can we implement it?
In practice, this is how it goes, in three main steps:
Construct the predicted treatment
Use the known formula (e.g., Medicaid eligibility rules) to calculate what the treatment would be under different policy shocks for each individual, using their actual characteristics.Recenter the treatment prediction
Take the average prediction across many counterfactual versions of the policy shocks (e.g., using permutations or random draws), and subtract this average from the original prediction. This step ensures exogeneity.Use the recentered prediction as your IV
You can then plug this into your IV regression (optionally adjusting for additional controls or heteroskedasticity for more efficiency).
In sum, this paper offers a practical and theoretically grounded upgrade to traditional simulated IVs, allowing researchers to harness more variation while preserving identification. Their method is especially relevant for settings where treatment depends on complex formulas and where exposure to shocks varies across individuals. Even without an off-the-shelf package, the step-by-step algorithm makes implementation accessible to applied researchers with a bit of coding. I’d keep an eye out for replication code from the authors, and in the meantime, this framework is definitely something to consider incorporating into your IV toolkit.
Donut Regression Discontinuity Designs
TL;DR: this paper provides a theoretical foundation for donut RD designs, a common robustness check in regression discontinuity (RD) studies where researchers exclude observations close to the cutoff. The authors show that while this approach can guard against manipulation concerns, it comes with significant costs in terms of bias and variance. They provide new tools to evaluate and compare donut and conventional RD estimates rigorously.
What is the paper about?
In RD designs, we estimate treatment effects by comparing observations just above and below a threshold (e.g., birth weight of 1500g for extra medical care). But what if there’s manipulation or measurement error exactly at the cutoff? A common fix is to run a "donut RD", where we remove a small window of observations near the cutoff.
In this this paper the authors ask: what does dropping those observations really do to our estimates and inference? And: can we still trust what we find?
What do they do?
The authors do three big things:
Theoretically analyze the costs of donut RD
Removing observations near the cutoff increases bias and variance (sometimes by a lot).
For example, excluding units within 10% of the bandwidth raises the bias by 41–63% and variance by 53–61%, depending on the kernel used.
Show that "bias-aware" confidence intervals still work
Recent methods from Armstrong and Kolesár (Armstrong and Kolesár, 2018, 2020; Kolesár and Rothe, 2018) allow for valid inference even with donut RD, as long as you account for the increased uncertainty.
These confidence intervals are longer, but still valid.
Propose new statistical tests to compare donut and conventional RD estimates
They develop tests that account for the dependence between estimates (since most of the data is shared).
One of their tests compares donut estimates to those using only the inner “donut hole” data, and shows better power.
Why is this important?
Donut RD is widely used but often done informally.
This paper offers a formal econometric framework for when and how to use donut RD responsibly.
It shows that donut RD doesn’t always make estimates “more robust” (it can make them less precise, so the trade-offs need to be understood).
Who should care?
Applied economists doing RD who want to run robustness checks on their identification.
Researchers worried about manipulation or bunching at the threshold.
Methodologists interested in nonparametric inference, confidence intervals, or testing RD assumptions.
Do they have code?
Some simulations were run in R using the RDHonest
package (this is a great name for a package), but no replication package is linked in this draft. The methodology is compatible with standard RD toolkits, and researchers could implement the bias-aware inference and tests using tools like: RDHonest
(R), and rdrobust
(R and Stata).
How can we implement it?
Run a conventional RD using local linear regression.
Drop observations close to the threshold (e.g., within 3g of a cutoff).
Estimate the donut RD with the same bandwidth.
Compute bias-aware confidence intervals using existing tools or formulas from the paper.
Compare the donut vs. regular RD estimates using the proposed statistical tests.
In sum, this paper takes a widely used empirical practice (the donut RD) and gives it a formal statistical backbone. It shows that while excluding data near the cutoff may help address concerns about manipulation or sorting, it doesn’t come for free: donut RD increases both bias and variance, and standard inference methods may no longer apply. The authors equip researchers with new theory, valid confidence intervals, and practical tests to help decide when donut RD is appropriate, and when it might do more harm than good. For anyone using RD designs in applied work, this paper is an important guide to thinking more carefully about robustness and inference.
Flexible Covariate Adjustments in Regression Discontinuity Designs
TL;DR: this paper introduces a new, more flexible way to use covariates in RDDs. Instead of adding covariates linearly (which can be inefficient - at best - especially with many covariates), the authors propose subtracting an estimated function of the covariates from the outcome variable (possibly using ML techniques such as LASSO - my favourite - RF, DNN, or ensemble combinations) before running a standard RD. This approach improves precision and robustness, even in high-dimensional settings, while staying easy to implement.
What is the paper about?
In RD designs, covariates aren’t necessary for identification, but they’re often included to reduce variance. The common way to do this is to include them linearly and globally (i.e., not localized by distance to the cutoff), which can be inefficient - especially with many covariates or nonlinear relationships.
This paper proposes a more general approach: instead of including covariates in the regression, subtract a flexible function of covariates from the outcome, then run a standard RD on this adjusted outcome. This function can be estimated using machine learning (lasso, forests, boosting, neural nets) or more traditional nonparametric methods. The key is to choose the function that best captures the part of the outcome that’s predictable from covariates but not related to treatment.
What do the authors do?
Theoretical contribution: they characterize the optimal covariate adjustment → a function that minimizes asymptotic variance. They show that the adjusted RD estimator remains consistent and asymptotically normal, even if this function is misspecified or slowly estimated. Their method enjoys a stronger version of Neyman orthogonality, meaning it’s robust to errors in the first stage.
Practical implementation: estimate a function η(Z) predicting the outcome from covariates, then subtract η(Z) from Y to get a new “de-noised” outcome, and finally run the usual local linear RD using this adjusted outcome.
Cross-fitting: to avoid overfitting, they use cross-fitting (sample splitting), a best practice in double ML, and they also offer two variants: localized and global, depending on whether the ML algorithm focuses near the cutoff or uses the full sample.
Software and methods: their ensemble uses linear models, post-lasso, boosted trees, and random forests, with weights chosen via super learner. It’s implemented in R (another point for my RStats gang), but easy ( :) ) to adapt elsewhere.
Why is this important?
Improves precision over conventional linear covariate adjustments, especially in high-dimensional settings.
Makes RD designs more robust without complicating the estimation procedure.
Compatible with existing RD software and bandwidth selection methods → just replace Y with the adjusted outcome.
Empirical performance
They reanalyze 56 RD specifications from 16 published economics papers. Key findings: in about half of the cases, adding covariates linearly didn’t reduce confidence intervals much; their flexible method achieved up to 30% shorter confidence intervals, equivalent to doubling the sample size; and even modest improvements (e.g., 10–20%) are common and valuable in practice. They also show in simulations that the method works well under different sample sizes and covariate counts.
Who should care?
Applied researchers using RD who want to improve statistical power or precision.
People working with many covariates or nonlinear relationships.
Methodologists and anyone using ML in causal inference.
Do they have code?
Yes, they implemented the method in R, using xgboost
, ranger
, hdm
, and SuperLearner
. However, no public GitHub repo is linked (as of now), so direct replication would require reconstructing based on the paper and supplement.
In sum, this paper offers a powerful yet intuitive improvement to covariate adjustment in RD designs. By using modern prediction tools to “de-noise” the outcome before estimation, researchers can achieve greater precision, especially in high-dimensional or nonlinear settings. The method is simple to implement, robust to estimation error, and fully compatible with standard RD tools, which makes it a valuable addition to the applied econometrician’s toolbox. For those looking to get more out of their data without sacrificing identification, this approach delivers both flexibility and efficiency.
A Partial Linear Estimator for Small Study Regression Discontinuity Designs
TL;DR: this paper revisits and revives an older method - Partial Linear Estimation (PLE) - for RDD, showing that it can outperform standard RD methods in small-sample or sparse designs, which are common in education and other policy evaluations. The authors modify and implement this estimator with new bandwidth and variance selection tools, and show in simulations that it's highly competitive (often better) when data near the cutoff is limited.
What is the paper about?
Most RD studies use what’s called local polynomial estimation (LPE). Think of it as fitting two separate regression lines: one just below the cutoff, one just above. Then, you compare their values right at the threshold to estimate the treatment effect.
But when you have a small number of observations near the cutoff, this method can become unstable - those two separate lines rely heavily on a few data points and can give noisy or biased results. What the authors propose is to use a method called the Partial Linear Estimator (PLE). Instead of fitting two separate lines, PLE fits a single smooth curve across the whole running variable (both sides of the cutoff). This curve is flexible - it adjusts for the general relationship between the running variable and the outcome - but it also includes a separate term that captures the treatment effect at the cutoff. It’s like saying: “let’s model the overall trend smoothly across the data, and then estimate the jump at the cutoff as a separate component.” The benefit is that this uses information from the entire sample and avoids the issue of "boundary bias" that arises from fitting two separate regressions at the edge of the data. That makes it more stable and precise, especially when you’re working with limited data close to the threshold, which is common in many education or policy settings.
What do they do?
They extend and modernize Porter’s estimator by using a local polynomial regression weights instead of local constant ones, then pairing it with a new bandwidth selection algorithm (SM) based on an asymptotic MSE criterion, then jackknife-based standard errors for inference (which is shown to perform well in small samples).
They then simulate performance in small-sample scenarios: they compare their estimator to standard methods like CV/IK, FLCI/AK, and local randomization (LR); across 4 different data generating processes (DGPs) and multiple sample sizes, their PLE method consistently performs well, especially PLE with IK bandwidth.
Finally they apply the method to real school accountability data: they analyze scores from Indiana schools just above/below the failing threshold. Despite a total sample of 1933 schools, only 88 are near the cutoff - typical of RD sparsity. PLE estimates a small, non-significant negative treatment effect (opposite of what policymakers would hope), highlighting the method’s usability even with thin data.
Why is this important?
Many applied RD settings (especially in education, public policy, or subgroup analysis) suffer from low effective sample sizes near the cutoff. Standard RD methods assume large samples or dense data around the threshold, which may not hold in practice. PLE offers greater stability in small samples, it avoids boundary bias from fitting separate models, and outperforms popular alternatives like FLCI and LR in realistic small-sample settings.
Who should care?
Applied researchers working with small RD samples (e.g., schools, villages, programs with eligibility thresholds).
Economists and statisticians interested in practical improvements to RD methods.
Policy evaluators using RD designs where most units are far from the threshold.
Do they have code?
Yes, the authors have implemented the method in an R package called rdple
, which is available on GitHub, not CRAN. The package includes functions for estimating the Partial Linear Estimator (PLE), selecting bandwidths, and computing standard errors. To install it, you’ll need to use the devtools
package:
install.packages("devtools") devtools::install_github("DSwartzy/rdple")
Once installed, you can use it to apply the PLE method to your own RD data, especially in small-sample contexts.
How can we implement it?
Install and load the
rdple
package from GitHubChoose a bandwidth: use either the authors’ proposed SM (Smoothness-based) bandwidth selector, designed specifically for PLE, or a standard one like IK (Imbens-Kalyanaraman). Both bandwidths are compatible with the method; PLE/IK performs particularly well in simulations.
Estimate the treatment effect: use local polynomial regression weights (typically local linear, i.e., degree = 1). The estimator fits a single smooth function across the running variable and estimates the treatment effect separately at the cutoff.
Compute standard errors: the recommended variance estimator is jackknife-based, specifically the one built on Wu (1986), which removes one residual at a time and has shown good performance in small samples. Other options are discussed (e.g., Hinkley’s method, direct plug-in), but jackknife on residuals is preferred due to stability and robustness.
Construct confidence intervals: use the jackknife variance estimate to build 95% intervals.
In sum, this paper revisits an underused method (PLE) and shows that it can be a highly effective alternative to standard RD estimators, particularly in small-sample or data-sparse settings. By fitting a smooth function across the entire running variable and estimating the treatment effect at the cutoff, the method avoids common issues like boundary bias and instability near the threshold. The authors modernize the estimator with better bandwidth selection and robust variance estimation, and they provide an easy-to-use R package for implementation. For researchers facing limited data near RD cutoffs, this approach offers a practical and reliable tool that often outperforms conventional techniques.
Treatment Effect Heterogeneity in Regression Discontinuity Designs
TL;DR: this paper develops a rigorous econometric framework for analyzing heterogeneous treatment effects in RDDs. It formalizes the most common empirical practice (interacting treatment with covariates in local linear regressions) and shows when and how this approach recovers causally interpretable conditional effects. It also provides tools for estimation, robust bias-corrected inference, and optimal bandwidth selection, all implemented in a companion R package rdhte
→ everything we love.
What is the paper about?
In practice, researchers often want to know: “does the treatment effect vary across different groups (e.g., by income, gender, or region)?” In RDDs, these subgroup analyses are typically done by adding interaction terms between the treatment and covariates (like income or education level) in a local linear regression. However, until now, there has been no formal framework validating this strategy, despite its widespread use. This paper provides that foundation. It shows when and how local linear regressions with interactions can be used to identify causal heterogeneous effects, and it clarifies what can and cannot be interpreted causally, especially when working with continuous covariates. The authors also develop tools for robust bias-corrected inference, optimal bandwidth selection, and group comparison tests, offering a unified approach to RD heterogeneity analysis.
What do the authors do?
They begin by showing that when the covariate is discrete (e.g., income quartiles), causal subgroup treatment effects are identified without needing additional assumptions. When the covariate is continuous, however, causal identification requires a semiparametric structure, specifically, that the treatment effect varies linearly with the covariate at the cutoff. This assumption ensures that the estimated heterogeneity reflects meaningful variation rather than noise or misspecification. Under this structure, the authors define conditions under which a standard local linear RD with interactions recovers the Conditional Average Treatment Effect (CATE) at the threshold. They also derive optimal bandwidth formulas tailored to heterogeneity targets, whether estimating group-specific effects or differences between groups. While having separate bandwidths for each group is theoretically optimal, they show that using a common bandwidth is often justified and simplifies implementation. For inference, the authors provide bias-corrected confidence intervals using the robust methods developed in Calonico et al. (2014), and extend these tools to allow for clustered standard errors. In their empirical illustration, they reanalyze a well-known RD study (Akhtari et al., 2022) on political turnover in Brazilian mayoral elections. They investigate treatment effect heterogeneity in headmaster replacement, using municipal income as the moderator, both discretized (via median, quartiles, deciles) and continuous. They find that heterogeneous effects are statistically significant among lower-income municipalities, and that modeling income continuously yields similar patterns with greater efficiency.
Why is this important?
Heterogeneity is central to policy → knowing who benefits most (or least) informs targeting and fairness. Many RD studies analyze subgroup effects informally; this paper standardizes and validates the most common empirical practice. It provides rigorous conditions for causal interpretation and the tools to do valid inference.
Who should care?
Applied economists and political scientists using RD designs with subgroup analysis.
Researchers working with discrete or continuous covariates in RD settings.
Anyone doing covariate-interacted local linear RD, especially for policy heterogeneity.
Do they have code?
Yes. The companion R package is called rdhte
, available here. It implements all the estimation and inference tools described in the paper, including linear interaction models, optimal bandwidth selectors, robust bias-corrected confidence intervals, and heterogeneity tests.
In sum, this paper offers a comprehensive framework for analyzing treatment effect heterogeneity in RD designs, bridging the gap between common empirical practice and formal identification theory. It clarifies when covariate interactions in local linear RD regressions yield causally interpretable effects, provides practical tools for estimation, inference, and bandwidth selection, and delivers everything in a ready-to-use R package. For researchers aiming to uncover who benefits most from treatment, this paper turns an informal add-on into a rigorous and reliable strategy.
Learning Conditional Average Treatment Effects in Regression Discontinuity Designs using Bayesian Additive Regression Trees
TL;DR: this paper introduces BARDDT, a purpose-built BART model for RDDs, that estimates Conditional Average Treatment Effects (CATE) at the cutoff, conditional on covariates. It outperforms standard BART, local polynomial RD, and CART-based alternatives, especially when treatment effects vary across units! It has everything we love: plots, trees, matrices
What is this paper about?
Most RD studies estimate the average treatment effect at the cutoff—but we often care about how that effect varies across people. For example, do students with lower high school GPAs respond differently to academic probation than students with stronger academic records?
This paper introduces a new method called BARDDT (Bayesian Additive Regression Trees for Discontinuity Treatment Effects) that helps answer exactly that. It’s a flexible, data-driven tool that can estimate how the treatment effect at the cutoff varies depending on someone’s characteristics—without needing to pre-specify the subgroups in advance.
In simple terms:
BARDDT looks for patterns in who responds more or less to the treatment, based on the covariates you have (e.g., age, gender, baseline performance).
It works like a very smart version of splitting your sample into subgroups, except it does this automatically, based on where the data shows meaningful differences.
Unlike standard regression trees or off-the-shelf machine learning tools, it’s specifically built for the RD setting: it respects the discontinuity structure and estimates heterogeneity right at the cutoff.
What do the authors do?
They develop BARDDT, a version of Bayesian Additive Regression Trees (BART) adapted for RDDs.
The model:
Fits smooth curves instead of flat segments within each tree.
Splits on both the running variable and individual covariates (e.g., gender, GPA).
Directly estimates individual-level treatment effects at the cutoff.
Run extensive simulations:
Compare BARDDT to standard BART, local polynomial RD, and tree-based CATE methods.
BARDDT consistently delivers lower bias and better CATE recovery, especially when relationships are nonlinear.
Apply the method to academic probation data:
Estimate how probation affects GPA for different types of students.
Find larger effects among students with low prior GPA or lighter course loads.
Why it matters
Heterogeneous treatment effects are often what policy cares about. Existing RD methods usually assume constant effects at the cutoff or do manual subgroup analysis. BARDDT offers a principled, flexible way to uncover causal heterogeneity without needing to specify subgroups ahead of time. It brings the benefits of ML to RD while still respecting identification assumptions.
Who should care?
Applied researchers using RDDs who want to understand which subgroups respond most to treatment.
Economists and data scientists working on personalized policy effects (e.g., education, health, labor).
Anyone interested in combining causal inference and ML, especially for structured designs like RD.
Do they have code?
Yes! The authors provide an open-source package called stochtree
(R and Python). It includes: BARDDT implementation, simulation code, and academic probation replication.
How to implement it
Prepare your RD dataset with a running variable and relevant covariates.
Standardize the running variable (the model expects this).
Fit BARDDT using the
stochtree
package.Estimate the CATE at the cutoff, conditional on covariates.
Visualize or summarize heterogeneity (e.g., using tree summaries of CATEs or marginal effects).
In sum, this paper shows how to bring modern ML techniques into the world of RDD, without breaking the causal assumptions that make RD attractive. By customizing BART to the RD context, the authors give researchers a new way to estimate and explore treatment heterogeneity at the margin. If you’ve been doing RD-by-subgroup or linear interactions, this is a powerful, flexible alternative, especially when you don’t know in advance where the differences lie. Do we ever?
While I do not have the background that presumably most of the people on this newsletter do, I found your first post very helpful in laying out some of the intuitions for why the DiD literature has been in crisis the last ~5 years (and how the new methods have successfully addressed many of the problems). I have no idea whether RDD or IV is in a similar crisis, or if the problems they face are the same as the ones DiD faces. These papers sound quite theoretical, and as though they are about quite fundamental features of the methodologies. Is there a similar piece outlining in big picture terms why these modifications to RDD and IV are necessary / will you write one?