Skip to contents

Regression adjustment in experiments

Many analysts who conduct and analyze experiments wish to use regression adjustment with a linear regression model to improve the precision of their estimate of the treatment effect. Unfortunately, regression adjustment can introduce small-sample bias and other undesirable properties (Freedman 2008). Lin (2013) proposes a simple strategy to fix these problems in sufficiently large samples:

  1. Center all predictors by subtracting each of their means.
  2. Estimate a linear model in which the treatment is interacted with each of the covariates.

The estimatr package includes a convenient function to implement this strategy:

library(estimatr)
library(marginaleffects)
lalonde <- read.csv("https://vincentarelbundock.github.io/Rdatasets/csv/MatchIt/lalonde.csv")

mod <- lm_lin(
    re78 ~ treat,
    covariates = ~ age + educ + race,
    data = lalonde,
    se_type = "HC3")
summary(mod)
#> 
#> Call:
#> lm_lin(formula = re78 ~ treat, covariates = ~age + educ + race, 
#>     data = lalonde, se_type = "HC3")
#> 
#> Standard error type:  HC3 
#> 
#> Coefficients:
#>                    Estimate Std. Error t value  Pr(>|t|) CI Lower CI Upper  DF
#> (Intercept)         6488.05     356.71 18.1885 2.809e-59  5787.50   7188.6 604
#> treat                489.73     878.52  0.5574 5.774e-01 -1235.59   2215.0 604
#> age_c                 85.88      35.42  2.4248 1.561e-02    16.32    155.4 604
#> educ_c               464.04     131.51  3.5286 4.495e-04   205.77    722.3 604
#> racehispan_c        2775.47    1155.40  2.4022 1.660e-02   506.38   5044.6 604
#> racewhite_c         2291.67     793.30  2.8888 4.006e-03   733.71   3849.6 604
#> treat:age_c           17.23      76.37  0.2256 8.216e-01  -132.75    167.2 604
#> treat:educ_c         226.71     308.43  0.7350 4.626e-01  -379.02    832.4 604
#> treat:racehispan_c -1057.84    2652.42 -0.3988 6.902e-01 -6266.92   4151.2 604
#> treat:racewhite_c  -1205.68    1805.21 -0.6679 5.045e-01 -4750.92   2339.6 604
#> 
#> Multiple R-squared:  0.05722 ,   Adjusted R-squared:  0.04317 
#> F-statistic: 4.238 on 9 and 604 DF,  p-value: 2.424e-05

We can obtain the same results by fitting a model with the standard lm function and using the comparisons() function:

mod <- lm(re78 ~ treat * (age + educ + race), data = lalonde)
avg_comparisons(
    mod,
    variables = "treat",
    vcov = "HC3")
#> 
#>   Term Contrast Estimate Std. Error      z Pr(>|z|) 2.5 % 97.5 %
#>  treat    1 - 0    489.7      878.5 0.5574  0.57722 -1232   2212
#> 
#> Prediction type:  response 
#> Columns: type, term, contrast, estimate, std.error, statistic, p.value, conf.low, conf.high

Notice that the treat coefficient and associate standard error in the lm_lin regression are exactly the same as the estimates produced by the comparisons() function.

References

  • Freedman, David A. “On Regression Adjustments to Experimental Data.” Advances in Applied Mathematics 40, no. 2 (February 2008): 180–93.
  • Lin, Winston. “Agnostic Notes on Regression Adjustments to Experimental Data: Reexamining Freedman’s Critique.” Annals of Applied Statistics 7, no. 1 (March 2013): 295–318. https://doi.org/10.1214/12-AOAS583.