The code in this vignette requires marginaleffects version 0.6.0 or the development version hosted on Github.

This vignette introduces the deltamethod() function, and the hypothesis argument of the comparisons(), marginaleffects(), and predictions() function. These features allow users to conduct linear and non-linear hypothesis tests and to compute custom contrasts (linear combinations) between parameters.

Delta method

Version 0.6.0 of marginaleffects includes a very simple yet powerful function called deltamethod(). This function emulates the behavior of the well-established car::deltaMethod and car::linearHypothesis functions, but it supports more models, requires fewer dependencies, and offers some convenience features like shortcuts for robust standard errors.

deltamethod() can be used to compute estimates and standard errors of arbitrary functions of model parameters. For example, it can be used to conduct tests of equality between coefficients, or to test the value of some linear or non-linear combination of quantities of interest. deltamethod() can also be used to conduct hypothesis tests on other functions of a model’s parameter, such as adjusted predictions or marginal effects.

Let’s start by estimating a simple model:

library(marginaleffects)
mod <- lm(mpg ~ hp + wt + factor(cyl), data = mtcars)

When the FUN and hypothesis arguments of deltamethod() equal NULL (the default), the function returns a data.frame of raw estimates:

deltamethod(mod)
#>           term    estimate
#> 1  (Intercept) 35.84599532
#> 2           hp -0.02311981
#> 3           wt -3.18140405
#> 4 factor(cyl)6 -3.35902490
#> 5 factor(cyl)8 -3.18588444

Test of equality between coefficients:

deltamethod(mod, "hp = wt")
#>      term estimate std.error statistic      p.value conf.low conf.high
#> 1 hp = wt 3.158284 0.7199081  4.387066 1.148899e-05  1.74729  4.569278

Non-linear function of coefficients

deltamethod(mod, "exp(hp + wt) = 0.1")
#>                 term    estimate  std.error statistic    p.value   conf.low
#> 1 exp(hp + wt) = 0.1 -0.05942178 0.02919718 -2.035189 0.04183184 -0.1166472
#>      conf.high
#> 1 -0.002196363

The vcov argument behaves in the same was as in the marginaleffects() function. It allows us to easily compute robust standard errors:

deltamethod(mod, "hp = wt", vcov = "HC3")
#>      term estimate std.error statistic      p.value conf.low conf.high
#> 1 hp = wt 3.158284 0.8051929  3.922394 8.767334e-05 1.580135  4.736433

We can use shortcuts like b1, b2, ... to identify the position of each parameter in the output of FUN. For example, b2=b3 is equivalent to hp=wt because those term names appear in the 2nd and 3rd row when we call deltamethod(mod).

deltamethod(mod, "b2 = b3")
#>      term estimate std.error statistic      p.value conf.low conf.high
#> 1 b2 = b3 3.158284 0.7199081  4.387066 1.148899e-05  1.74729  4.569278

Term names with special characters must be enclosed in backticks:

deltamethod(mod, "`factor(cyl)6` = `factor(cyl)8`")
#>                              term   estimate std.error  statistic  p.value
#> 1 `factor(cyl)6` = `factor(cyl)8` -0.1731405  1.653923 -0.1046847 0.916626
#>   conf.low conf.high
#> 1 -3.41477   3.06849

The FUN argument can be used to compute standard errors for arbitrary functions of model parameters. This user-supplied function must accept a single model object, and return a numeric vector or a data.frame with two columns named term and estimate.

mod <- glm(am ~ hp + mpg, data = mtcars, family = binomial)

f <- function(x) predict(x, type = "link", newdata = mtcars)
p <- deltamethod(mod, FUN = f)
head(p)
#>   term   estimate std.error  statistic    p.value  conf.low  conf.high
#> 1   b1 -1.0983601 0.7160423 -1.5339319 0.12504640 -2.501777  0.3050570
#> 2   b2 -1.0983601 0.7160423 -1.5339319 0.12504640 -2.501777  0.3050570
#> 3   b3  0.2331884 0.7808207  0.2986452 0.76521076 -1.297192  1.7635688
#> 4   b4 -0.5945143 0.6471012 -0.9187346 0.35823441 -1.862809  0.6737808
#> 5   b5 -0.4175761 0.6474633 -0.6449417 0.51896494 -1.686581  0.8514287
#> 6   b6 -5.0264654 2.1949096 -2.2900558 0.02201808 -9.328409 -0.7245217

Test of equality between two predictions (row 2 vs row 3):

f <- function(x) predict(x, newdata = mtcars)
deltamethod(mod, FUN = f, hypothesis = "b2 = b3")
#>      term  estimate std.error statistic    p.value  conf.low  conf.high
#> 1 b2 = b3 -1.331548 0.6155513  -2.16318 0.03052731 -2.538007 -0.1250901

Note that we specified the newdata argument in the f function. This is because the predict() method associated with lm objects will automatically the original fitted values when newdata is NULL, instead of returning the slightly altered fitted values which we need to compute numerical derivatives in the delta method.

We can also use numeric vectors to specify linear combinations of parameters. For example, there are 3 coefficients in the last model we estimated. To test the null hypothesis that the sum of the 2nd and 3rd coefficients is equal to 0, we can do:

deltamethod(mod, hypothesis = c(0, 1, 1))
#>     term estimate std.error statistic    p.value  conf.low conf.high
#> 1 custom 1.314659 0.5927081  2.218055 0.02655107 0.1529728  2.476346

See below for more example of how to use string formulas, numeric vectors, or matrices to calculate custom contrasts, linear combinations, and linear or non-linear hypothesis tests.

hypothesis Formulas

Each of the 4 core functions of the package support a hypothesis argument which behaves similarly to the deltamethod() function. This argument allows users to specify custom hypothesis tests and contrasts, in order to test null hypotheses such as:

  • The coefficients \(\beta_1\) and \(\beta_2\) are equal.
  • The marginal effects of \(X_1\) and \(X_2\) equal.
  • The marginal effect of \(X\) when \(W=0\) is equal to the marginal effect of \(X\) when \(W=1\).
  • A non-linear function of adjusted predictions is equal to 100.
  • The marginal mean in the control group is equal to the average of marginal means in the other 3 treatment arms.
  • Cross-level contrasts: In a multinomial model, the effect of \(X\) on the 1st outcome level is equal to the effect of \(X\) on the 2nd outcome level.

Marginal effects

For example, let’s fit a model and compute some marginal effects at the mean:

library(marginaleffects)

mod <- lm(mpg ~ am + vs, data = mtcars)

marginaleffects(mod, newdata = "mean")
#>   rowid     type term     dydx std.error statistic      p.value conf.low
#> 1     1 response   am 6.066667  1.274842  4.758758 1.947875e-06 3.568022
#> 2     1 response   vs 6.929365  1.262132  5.490207 4.014620e-08 4.455632
#>   conf.high      am     vs   eps
#> 1  8.565312 0.40625 0.4375 1e-04
#> 2  9.403098 0.40625 0.4375 1e-04

Is the marginal effect of am different from the marginal effect of vs? To answer this question we can run a linear hypothesis test:

library(marginaleffects)

mod <- lm(mpg ~ am + vs, data = mtcars)

marginaleffects(
    mod,
    newdata = "mean",
    hypothesis = "am = vs")
#>       type  term       dydx std.error  statistic   p.value conf.low conf.high
#> 1 response am=vs -0.8626984  1.939057 -0.4449063 0.6563875 -4.66318  2.937783

The hypothesis string can include any valid R expression, so we can run some silly non-linear tests:

marginaleffects(
    mod,
    newdata = "mean",
    hypothesis = "exp(am) - 2 * vs = -400")
#>       type              term     dydx std.error statistic   p.value  conf.low
#> 1 response exp(am)-2*vs=-400 817.3821  550.2221  1.485549 0.1373984 -261.0335
#>   conf.high
#> 1  1895.798

Adjusted Predictions

Now consider the case of adjusted predictions:

p <- predictions(
    mod,
    newdata = datagrid(am = 0:1, vs = 0:1))
p
#>   rowid     type predicted std.error statistic       p.value conf.low conf.high
#> 1     1 response  14.59444 0.9261514  15.75816  6.036286e-56 12.70025  16.48864
#> 2     2 response  21.52381 1.1300269  19.04717  6.935682e-81 19.21265  23.83497
#> 3     3 response  20.66111 1.1830035  17.46496  2.648808e-68 18.24160  23.08063
#> 4     4 response  27.59048 1.1300269  24.41577 1.163052e-131 25.27931  29.90164
#>   am vs
#> 1  0  0
#> 2  0  1
#> 3  1  0
#> 4  1  1

Since there is no term column in the output of the predictions function, we must use parameter identifiers like b1, b2, etc. to determine which estimates we want to compare:

predictions(
    mod,
    hypothesis = "b1 = b2",
    newdata = datagrid(am = 0:1, vs = 0:1))
#>       type  term predicted std.error statistic     p.value  conf.low conf.high
#> 1 response b1=b2 -6.929365  1.262132 -5.490208 4.01461e-08 -9.403098 -4.455633

p$predicted[1] - p$predicted[2]
#> [1] -6.929365

In the next section, we will see that we can get equivalent results by using a vector of contrast weights, which will be used to compute a linear combination of estimates:

predictions(
    mod,
    hypothesis = c(1, -1, 0, 0),
    newdata = datagrid(am = 0:1, vs = 0:1))
#>       type   term predicted std.error statistic     p.value  conf.low conf.high
#> 1 response custom -6.929365  1.262132 -5.490208 4.01461e-08 -9.403098 -4.455633

There are many more possibilities:

predictions(
    mod,
    hypothesis = "b1 + b2 = 30",
    newdata = datagrid(am = 0:1, vs = 0:1))
#>       type     term predicted std.error statistic      p.value conf.low
#> 1 response b1+b2=30  6.118254  1.635988   3.73979 0.0001841737 2.911776
#>   conf.high
#> 1  9.324732

p$predicted[1] + p$predicted[2] - 30
#> [1] 6.118254

predictions(
    mod,
    hypothesis = "(b2 - b1) / (b3 - b2) = 0",
    newdata = datagrid(am = 0:1, vs = 0:1))
#>       type              term predicted std.error  statistic   p.value  conf.low
#> 1 response (b2-b1)/(b3-b2)=0 -8.032199  16.96625 -0.4734223 0.6359119 -41.28543
#>   conf.high
#> 1  25.22103

Average contrasts or marginal effects

The standard workflow with the marginaleffects package is to first call a function like marginaleffects() or comparisons() to compute unit-level quantities, and then to call summary() to aggregate the unit-level quantities into “Average Marginal Effects” or “Average Contrasts.” Unfortunately, the summary() and tidy() functions do not have a hypothesis argument, so we cannot use it to conduct hypothesis tests on aggregated quantities. Instead, we can use the transform_pre argument to emulate the behavior of summary(), computing average marginal effects in a single step.

First, note that these three commands produce the same results:

comparisons(mod) |>
    tidy()
#>       type term contrast estimate std.error statistic      p.value conf.low
#> 1 response   am    1 - 0 6.066667  1.274842  4.758759 1.947871e-06 3.568022
#> 2 response   vs    1 - 0 6.929365  1.262132  5.490208 4.014610e-08 4.455633
#>   conf.high
#> 1  8.565312
#> 2  9.403098

comparisons(
    mod,
    transform_pre = "differenceavg")
#>       type term          contrast comparison std.error statistic      p.value
#> 1 response   am mean(1) - mean(0)   6.066667  1.274842  4.758759 1.947871e-06
#> 2 response   vs mean(1) - mean(0)   6.929365  1.262132  5.490208 4.014610e-08
#>   conf.low conf.high
#> 1 3.568022  8.565312
#> 2 4.455633  9.403098

Notice that in the last one we did not need to use tidy() or summary(), and yet we still obtained average contrasts. See the transformations section of the Contrasts vignette for more details.

With these results in hand, we can finally conduct a linear hypothesis test between average marginal effects:

comparisons(
    mod,
    hypothesis = "am = vs",
    transform_pre = "differenceavg")
#>       type  term comparison std.error  statistic   p.value  conf.low conf.high
#> 1 response am=vs -0.8626984  1.939056 -0.4449063 0.6563875 -4.663179  2.937782

Computing contrasts between average marginal effects requires a little care to obtain the right scale. In particular, we need to specify both the variables and the transform_pre:

comparisons(
    mod,
    hypothesis = "am = vs",
    variables = c("am", "vs"),
    transform_pre = "dydxavg")
#>       type  term comparison std.error  statistic   p.value conf.low conf.high
#> 1 response am=vs -0.8626984  1.939057 -0.4449062 0.6563875 -4.66318  2.937783

hypothesis Vectors and Matrices

The marginalmeans() function computes estimated marginal means. The hypothesis argument of that function offers a powerful mechanism to estimate custom contrasts between marginal means, by way of linear combination.

Consider a simple example:

library(marginaleffects)
library(emmeans)
library(nnet)

dat <- mtcars
dat$carb <- factor(dat$carb)
dat$cyl <- factor(dat$cyl)
dat$am <- as.logical(dat$am)

mod <- lm(mpg ~ carb + cyl, dat)
mm <- marginalmeans(mod, variables = "carb")
mm
#>   term value marginalmean std.error conf.low conf.high      p.value statistic
#> 1 carb     1     21.66232  1.438417 18.84307  24.48156 2.975538e-51 15.059829
#> 2 carb     2     21.34058  1.234609 18.92079  23.76037 6.071610e-67 17.285289
#> 3 carb     3     21.41667  2.191616 17.12118  25.71215 1.483584e-22  9.772091
#> 4 carb     4     18.88406  1.210941 16.51066  21.25746 7.929506e-55 15.594538
#> 5 carb     6     19.76014  3.551143 12.80003  26.72026 2.629853e-08  5.564447
#> 6 carb     8     20.11667  3.510156 13.23689  26.99645 9.984697e-09  5.730989

The contrast between marginal means for carb==1 and carb==2 is:

21.66232 - 21.34058 
#> [1] 0.32174

or

21.66232 + -(21.34058)
#> [1] 0.32174

or

sum(c(21.66232, 21.34058) * c(1, -1))
#> [1] 0.32174

or

c(21.66232, 21.34058) %*% c(1, -1)
#>         [,1]
#> [1,] 0.32174

The last two commands express the contrast of interest as a linear combination of marginal means.

Simple contrast

In the marginalmeans() function, we can supply a hypothesis argument to compute linear combinations of marginal means. This argument must be a numeric vector of the same length as the number of rows in the output of marginalmeans(). For example, in the previous there were six rows, and the two marginal means we want to compare are at in the first two positions:

lc <- c(1, -1, 0, 0, 0, 0)
marginalmeans(mod, variables = "carb", hypothesis = lc)
#>     term marginalmean std.error  conf.low conf.high   p.value statistic
#> 1 custom    0.3217391  1.773733 -3.154713  3.798191 0.8560607  0.181391

Complex contrast

Of course, we can also estimate more complex contrasts:

lc <- c(-2, 1, 1, 0, -1, 1)
marginalmeans(mod, variables = "carb", hypothesis = lc)
#>     term marginalmean std.error  conf.low conf.high   p.value   statistic
#> 1 custom   -0.2108696  6.928859 -13.79118  13.36945 0.9757213 -0.03043352

emmeans produces similar results:

library(emmeans)
em <- emmeans(mod, "carb")
lc <- data.frame(custom_contrast = c(-2, 1, 1, 0, -1, 1))
contrast(em, method = lc)
#>  contrast        estimate   SE df t.ratio p.value
#>  custom_contrast   -0.211 6.93 24  -0.030  0.9760
#> 
#> Results are averaged over the levels of: cyl

Multiple contrasts

Users can also compute multiple linear combinations simultaneously by supplying a numeric matrix to hypothesis. This matrix must have the same number of rows as the output of marginaleffects(), and each column represents a distinct set of weights for different linear combinations. The column names of the matrix become labels in the output. For example:

lc <- matrix(c(
    -2, 1, 1, 0, -1, 1,
    1, -1, 0, 0, 0, 0
    ), ncol = 2)
colnames(lc) <- c("Contrast A", "Contrast B")
lc
#>      Contrast A Contrast B
#> [1,]         -2          1
#> [2,]          1         -1
#> [3,]          1          0
#> [4,]          0          0
#> [5,]         -1          0
#> [6,]          1          0

marginalmeans(mod, variables = "carb", hypothesis = lc)
#>         term marginalmean std.error   conf.low conf.high   p.value   statistic
#> 1 Contrast A   -0.2108696  6.928859 -13.791184 13.369445 0.9757213 -0.03043352
#> 2 Contrast B    0.3217391  1.773733  -3.154713  3.798191 0.8560607  0.18139099

Contrasts across response levels

In models with multinomial outcomes, one may be interested in comparing outcomes or contrasts across response levels. For example, in this model there are 18 estimated marginal means, across 6 outcome levels (the group column):

library(nnet)
mod <- multinom(carb ~ mpg + cyl, data = dat, trace = FALSE)
mm <- marginalmeans(mod, type = "probs")
mm
#>    group term value marginalmean    std.error
#> 1      1  cyl     4 3.678521e-01 2.598299e-01
#> 2      2  cyl     4 6.310922e-01 2.600886e-01
#> 3      3  cyl     4 6.852625e-04 1.192096e-02
#> 4      4  cyl     4 2.118825e-04 1.750265e-02
#> 5      6  cyl     4 8.815299e-06 8.406582e-05
#> 6      8  cyl     4 1.496776e-04 7.970023e-03
#> 7      1  cyl     6 2.828726e-01 1.957050e-01
#> 8      2  cyl     6 1.848644e-06 2.394710e-06
#> 9      3  cyl     6 1.121687e-05 1.323445e-03
#> 10     4  cyl     6 5.562672e-01 2.184901e-01
#> 11     6  cyl     6 1.608378e-01 1.539716e-01
#> 12     8  cyl     6 9.291803e-06 7.982244e-04
#> 13     1  cyl     8 4.625030e-04 9.586613e-03
#> 14     2  cyl     8 6.654572e-01 3.745272e-01
#> 15     3  cyl     8 3.103763e-01 3.714795e-01
#> 16     4  cyl     8 9.582000e-03 2.289015e-02
#> 17     6  cyl     8 4.345753e-09 9.895429e-08
#> 18     8  cyl     8 1.412195e-02 4.673316e-02

Let’s contrast the marginal means in the first outcome level when cyl equals 4 and 6. These marginal means are located in rows 1 and 7 respectively:

lc <- rep(0, nrow(mm))
lc[1] <- -1
lc[7] <- 1
marginalmeans(
    mod,
    type = "probs",
    hypothesis = lc)
#>     term marginalmean std.error
#> 1 custom  -0.08497951  0.320746

This is indeed equal to the results we would have obtained manually:

2.828726e-01 - 3.678521e-01
#> [1] -0.0849795

Now let’s say we want to calculate a “contrast in contrasts”, that is, the outcome of a 3-step process:

  1. Contrast between cyl=6 and cyl=4 in the 1st outcome level
  2. Contrast between cyl=6 and cyl=4 in the 2nd outcome level
  3. Contrast between the contrasts defined in steps 1 and 2.

We create the linear combination weights as follows:

lc <- rep(0, nrow(mm))
lc[c(1, 8)] <- -1
lc[c(7, 2)] <- 1

To make sure that the weights are correct, we can display them side by side with the original marginalmeans() output:

transform(mm[, 1:3], lc = lc)
#>    group term value lc
#> 1      1  cyl     4 -1
#> 2      2  cyl     4  1
#> 3      3  cyl     4  0
#> 4      4  cyl     4  0
#> 5      6  cyl     4  0
#> 6      8  cyl     4  0
#> 7      1  cyl     6  1
#> 8      2  cyl     6 -1
#> 9      3  cyl     6  0
#> 10     4  cyl     6  0
#> 11     6  cyl     6  0
#> 12     8  cyl     6  0
#> 13     1  cyl     8  0
#> 14     2  cyl     8  0
#> 15     3  cyl     8  0
#> 16     4  cyl     8  0
#> 17     6  cyl     8  0
#> 18     8  cyl     8  0

Compute the results:

marginalmeans(mod, type = "probs", hypothesis = lc)
#>     term marginalmean std.error
#> 1 custom    0.5461109 0.5498044

Pairwise contrasts: Difference-in-Differences

Now we illustrate how to use the machinery described above to do pairwise comparisons between contrasts, a type of analysis often associated with a “Difference-in-Differences” research design.

First, we simultate data with two treatment groups and pre/post periods:

library(data.table)

N <- 1000
did <- data.table(
    id = 1:N,
    pre = rnorm(N),
    trt = sample(0:1, N, replace = TRUE))
did$post <- did$pre + did$trt * 0.3 + rnorm(N)
did <- melt(
    did,
    value.name = "y",
    variable.name = "time",
    id.vars = c("id", "trt"))
head(did)
#>    id trt time          y
#> 1:  1   1  pre  0.5941759
#> 2:  2   1  pre  0.4250376
#> 3:  3   0  pre  1.5833849
#> 4:  4   1  pre  0.8573142
#> 5:  5   1  pre -0.4107030
#> 6:  6   1  pre -1.5716398

Then, we estimate a linear model with a multiple interaction between the time and the treatment indicators. We alco compute contrasts at the mean for each treatment level:

did_model <- lm(y ~ time * trt, data = did)

comparisons(
    did_model,
    newdata = datagrid(trt = 0:1),
    variables = "time")
#>   rowid     type term   contrast   comparison  std.error   statistic
#> 1     1 response time post - pre -0.004672243 0.07875075 -0.05932951
#> 2     2 response time post - pre  0.321646925 0.07906638  4.06806184
#>        p.value   conf.low conf.high    predicted predicted_hi predicted_lo time
#> 1 0.9526896533 -0.1590209 0.1496764  0.001185225 -0.003487019  0.001185225  pre
#> 2 0.0000474058  0.1666797 0.4766142 -0.071434043  0.250212883 -0.071434043  pre
#>   trt
#> 1   0
#> 2   1

Finally, we compute pairwise differences between contrasts. This is the Diff-in-Diff estimate:

comparisons(
    did_model,
    variables = "time",
    newdata = datagrid(trt = 0:1),
    hypothesis = "pairwise")
#>       type          term comparison std.error statistic     p.value  conf.low
#> 1 response Row 2 - Row 1  0.3263192 0.1115938   2.92417 0.003453763 0.1075994
#>   conf.high
#> 1  0.545039