vignettes/hypothesis.Rmd
hypothesis.Rmd
The code in this vignette requires marginaleffects
version 0.6.0 or the development version hosted on Github.
This vignette introduces the deltamethod()
function, and the hypothesis
argument of the comparisons()
, marginaleffects()
, and predictions()
function. These features allow users to conduct linear and non-linear hypothesis tests and to compute custom contrasts (linear combinations) between parameters.
Version 0.6.0 of marginaleffects
includes a very simple yet powerful function called deltamethod()
. This function emulates the behavior of the well-established car::deltaMethod
and car::linearHypothesis
functions, but it supports more models, requires fewer dependencies, and offers some convenience features like shortcuts for robust standard errors.
deltamethod()
can be used to compute estimates and standard errors of arbitrary functions of model parameters. For example, it can be used to conduct tests of equality between coefficients, or to test the value of some linear or non-linear combination of quantities of interest. deltamethod()
can also be used to conduct hypothesis tests on other functions of a model’s parameter, such as adjusted predictions or marginal effects.
Let’s start by estimating a simple model:
library(marginaleffects)
mod <- lm(mpg ~ hp + wt + factor(cyl), data = mtcars)
When the FUN
and hypothesis
arguments of deltamethod()
equal NULL
(the default), the function returns a data.frame of raw estimates:
deltamethod(mod)
#> term estimate
#> 1 (Intercept) 35.84599532
#> 2 hp -0.02311981
#> 3 wt -3.18140405
#> 4 factor(cyl)6 -3.35902490
#> 5 factor(cyl)8 -3.18588444
Test of equality between coefficients:
deltamethod(mod, "hp = wt")
#> term estimate std.error statistic p.value conf.low conf.high
#> 1 hp = wt 3.158284 0.7199081 4.387066 1.148899e-05 1.74729 4.569278
Non-linear function of coefficients
deltamethod(mod, "exp(hp + wt) = 0.1")
#> term estimate std.error statistic p.value conf.low
#> 1 exp(hp + wt) = 0.1 -0.05942178 0.02919718 -2.035189 0.04183184 -0.1166472
#> conf.high
#> 1 -0.002196363
The vcov
argument behaves in the same was as in the marginaleffects()
function. It allows us to easily compute robust standard errors:
deltamethod(mod, "hp = wt", vcov = "HC3")
#> term estimate std.error statistic p.value conf.low conf.high
#> 1 hp = wt 3.158284 0.8051929 3.922394 8.767334e-05 1.580135 4.736433
We can use shortcuts like b1
, b2
, ...
to identify the position of each parameter in the output of FUN
. For example, b2=b3
is equivalent to hp=wt
because those term names appear in the 2nd and 3rd row when we call deltamethod(mod)
.
deltamethod(mod, "b2 = b3")
#> term estimate std.error statistic p.value conf.low conf.high
#> 1 b2 = b3 3.158284 0.7199081 4.387066 1.148899e-05 1.74729 4.569278
Term names with special characters must be enclosed in backticks:
deltamethod(mod, "`factor(cyl)6` = `factor(cyl)8`")
#> term estimate std.error statistic p.value
#> 1 `factor(cyl)6` = `factor(cyl)8` -0.1731405 1.653923 -0.1046847 0.916626
#> conf.low conf.high
#> 1 -3.41477 3.06849
The FUN
argument can be used to compute standard errors for arbitrary functions of model parameters. This user-supplied function must accept a single model object, and return a numeric vector or a data.frame with two columns named term
and estimate
.
mod <- glm(am ~ hp + mpg, data = mtcars, family = binomial)
f <- function(x) predict(x, type = "link", newdata = mtcars)
p <- deltamethod(mod, FUN = f)
head(p)
#> term estimate std.error statistic p.value conf.low conf.high
#> 1 b1 -1.0983601 0.7160423 -1.5339319 0.12504640 -2.501777 0.3050570
#> 2 b2 -1.0983601 0.7160423 -1.5339319 0.12504640 -2.501777 0.3050570
#> 3 b3 0.2331884 0.7808207 0.2986452 0.76521076 -1.297192 1.7635688
#> 4 b4 -0.5945143 0.6471012 -0.9187346 0.35823441 -1.862809 0.6737808
#> 5 b5 -0.4175761 0.6474633 -0.6449417 0.51896494 -1.686581 0.8514287
#> 6 b6 -5.0264654 2.1949096 -2.2900558 0.02201808 -9.328409 -0.7245217
Test of equality between two predictions (row 2 vs row 3):
f <- function(x) predict(x, newdata = mtcars)
deltamethod(mod, FUN = f, hypothesis = "b2 = b3")
#> term estimate std.error statistic p.value conf.low conf.high
#> 1 b2 = b3 -1.331548 0.6155513 -2.16318 0.03052731 -2.538007 -0.1250901
Note that we specified the newdata
argument in the f
function. This is because the predict()
method associated with lm
objects will automatically the original fitted values when newdata
is NULL
, instead of returning the slightly altered fitted values which we need to compute numerical derivatives in the delta method.
We can also use numeric vectors to specify linear combinations of parameters. For example, there are 3 coefficients in the last model we estimated. To test the null hypothesis that the sum of the 2nd and 3rd coefficients is equal to 0, we can do:
deltamethod(mod, hypothesis = c(0, 1, 1))
#> term estimate std.error statistic p.value conf.low conf.high
#> 1 custom 1.314659 0.5927081 2.218055 0.02655107 0.1529728 2.476346
See below for more example of how to use string formulas, numeric vectors, or matrices to calculate custom contrasts, linear combinations, and linear or non-linear hypothesis tests.
hypothesis
Formulas
Each of the 4 core functions of the package support a hypothesis
argument which behaves similarly to the deltamethod()
function. This argument allows users to specify custom hypothesis tests and contrasts, in order to test null hypotheses such as:
For example, let’s fit a model and compute some marginal effects at the mean:
library(marginaleffects)
mod <- lm(mpg ~ am + vs, data = mtcars)
marginaleffects(mod, newdata = "mean")
#> rowid type term dydx std.error statistic p.value conf.low
#> 1 1 response am 6.066667 1.274842 4.758758 1.947875e-06 3.568022
#> 2 1 response vs 6.929365 1.262132 5.490207 4.014620e-08 4.455632
#> conf.high am vs eps
#> 1 8.565312 0.40625 0.4375 1e-04
#> 2 9.403098 0.40625 0.4375 1e-04
Is the marginal effect of am
different from the marginal effect of vs
? To answer this question we can run a linear hypothesis test:
library(marginaleffects)
mod <- lm(mpg ~ am + vs, data = mtcars)
marginaleffects(
mod,
newdata = "mean",
hypothesis = "am = vs")
#> type term dydx std.error statistic p.value conf.low conf.high
#> 1 response am=vs -0.8626984 1.939057 -0.4449063 0.6563875 -4.66318 2.937783
The hypothesis
string can include any valid R
expression, so we can run some silly non-linear tests:
marginaleffects(
mod,
newdata = "mean",
hypothesis = "exp(am) - 2 * vs = -400")
#> type term dydx std.error statistic p.value conf.low
#> 1 response exp(am)-2*vs=-400 817.3821 550.2221 1.485549 0.1373984 -261.0335
#> conf.high
#> 1 1895.798
Now consider the case of adjusted predictions:
p <- predictions(
mod,
newdata = datagrid(am = 0:1, vs = 0:1))
p
#> rowid type predicted std.error statistic p.value conf.low conf.high
#> 1 1 response 14.59444 0.9261514 15.75816 6.036286e-56 12.70025 16.48864
#> 2 2 response 21.52381 1.1300269 19.04717 6.935682e-81 19.21265 23.83497
#> 3 3 response 20.66111 1.1830035 17.46496 2.648808e-68 18.24160 23.08063
#> 4 4 response 27.59048 1.1300269 24.41577 1.163052e-131 25.27931 29.90164
#> am vs
#> 1 0 0
#> 2 0 1
#> 3 1 0
#> 4 1 1
Since there is no term
column in the output of the predictions
function, we must use parameter identifiers like b1
, b2
, etc. to determine which estimates we want to compare:
predictions(
mod,
hypothesis = "b1 = b2",
newdata = datagrid(am = 0:1, vs = 0:1))
#> type term predicted std.error statistic p.value conf.low conf.high
#> 1 response b1=b2 -6.929365 1.262132 -5.490208 4.01461e-08 -9.403098 -4.455633
p$predicted[1] - p$predicted[2]
#> [1] -6.929365
In the next section, we will see that we can get equivalent results by using a vector of contrast weights, which will be used to compute a linear combination of estimates:
predictions(
mod,
hypothesis = c(1, -1, 0, 0),
newdata = datagrid(am = 0:1, vs = 0:1))
#> type term predicted std.error statistic p.value conf.low conf.high
#> 1 response custom -6.929365 1.262132 -5.490208 4.01461e-08 -9.403098 -4.455633
There are many more possibilities:
predictions(
mod,
hypothesis = "b1 + b2 = 30",
newdata = datagrid(am = 0:1, vs = 0:1))
#> type term predicted std.error statistic p.value conf.low
#> 1 response b1+b2=30 6.118254 1.635988 3.73979 0.0001841737 2.911776
#> conf.high
#> 1 9.324732
p$predicted[1] + p$predicted[2] - 30
#> [1] 6.118254
predictions(
mod,
hypothesis = "(b2 - b1) / (b3 - b2) = 0",
newdata = datagrid(am = 0:1, vs = 0:1))
#> type term predicted std.error statistic p.value conf.low
#> 1 response (b2-b1)/(b3-b2)=0 -8.032199 16.96625 -0.4734223 0.6359119 -41.28543
#> conf.high
#> 1 25.22103
The standard workflow with the marginaleffects
package is to first call a function like marginaleffects()
or comparisons()
to compute unit-level quantities, and then to call summary()
to aggregate the unit-level quantities into “Average Marginal Effects” or “Average Contrasts.” Unfortunately, the summary()
and tidy()
functions do not have a hypothesis
argument, so we cannot use it to conduct hypothesis tests on aggregated quantities. Instead, we can use the transform_pre
argument to emulate the behavior of summary()
, computing average marginal effects in a single step.
First, note that these three commands produce the same results:
comparisons(mod) |>
tidy()
#> type term contrast estimate std.error statistic p.value conf.low
#> 1 response am 1 - 0 6.066667 1.274842 4.758759 1.947871e-06 3.568022
#> 2 response vs 1 - 0 6.929365 1.262132 5.490208 4.014610e-08 4.455633
#> conf.high
#> 1 8.565312
#> 2 9.403098
comparisons(
mod,
transform_pre = "differenceavg")
#> type term contrast comparison std.error statistic p.value
#> 1 response am mean(1) - mean(0) 6.066667 1.274842 4.758759 1.947871e-06
#> 2 response vs mean(1) - mean(0) 6.929365 1.262132 5.490208 4.014610e-08
#> conf.low conf.high
#> 1 3.568022 8.565312
#> 2 4.455633 9.403098
Notice that in the last one we did not need to use tidy()
or summary()
, and yet we still obtained average contrasts. See the transformations section of the Contrasts vignette for more details.
With these results in hand, we can finally conduct a linear hypothesis test between average marginal effects:
comparisons(
mod,
hypothesis = "am = vs",
transform_pre = "differenceavg")
#> type term comparison std.error statistic p.value conf.low conf.high
#> 1 response am=vs -0.8626984 1.939056 -0.4449063 0.6563875 -4.663179 2.937782
Computing contrasts between average marginal effects requires a little care to obtain the right scale. In particular, we need to specify both the variables
and the transform_pre
:
comparisons(
mod,
hypothesis = "am = vs",
variables = c("am", "vs"),
transform_pre = "dydxavg")
#> type term comparison std.error statistic p.value conf.low conf.high
#> 1 response am=vs -0.8626984 1.939057 -0.4449062 0.6563875 -4.66318 2.937783
hypothesis
Vectors and Matrices
The marginalmeans()
function computes estimated marginal means. The hypothesis
argument of that function offers a powerful mechanism to estimate custom contrasts between marginal means, by way of linear combination.
Consider a simple example:
library(marginaleffects)
library(emmeans)
library(nnet)
dat <- mtcars
dat$carb <- factor(dat$carb)
dat$cyl <- factor(dat$cyl)
dat$am <- as.logical(dat$am)
mod <- lm(mpg ~ carb + cyl, dat)
mm <- marginalmeans(mod, variables = "carb")
mm
#> term value marginalmean std.error conf.low conf.high p.value statistic
#> 1 carb 1 21.66232 1.438417 18.84307 24.48156 2.975538e-51 15.059829
#> 2 carb 2 21.34058 1.234609 18.92079 23.76037 6.071610e-67 17.285289
#> 3 carb 3 21.41667 2.191616 17.12118 25.71215 1.483584e-22 9.772091
#> 4 carb 4 18.88406 1.210941 16.51066 21.25746 7.929506e-55 15.594538
#> 5 carb 6 19.76014 3.551143 12.80003 26.72026 2.629853e-08 5.564447
#> 6 carb 8 20.11667 3.510156 13.23689 26.99645 9.984697e-09 5.730989
The contrast between marginal means for carb==1
and carb==2
is:
21.66232 - 21.34058
#> [1] 0.32174
or
21.66232 + -(21.34058)
#> [1] 0.32174
or
or
The last two commands express the contrast of interest as a linear combination of marginal means.
In the marginalmeans()
function, we can supply a hypothesis
argument to compute linear combinations of marginal means. This argument must be a numeric vector of the same length as the number of rows in the output of marginalmeans()
. For example, in the previous there were six rows, and the two marginal means we want to compare are at in the first two positions:
lc <- c(1, -1, 0, 0, 0, 0)
marginalmeans(mod, variables = "carb", hypothesis = lc)
#> term marginalmean std.error conf.low conf.high p.value statistic
#> 1 custom 0.3217391 1.773733 -3.154713 3.798191 0.8560607 0.181391
Of course, we can also estimate more complex contrasts:
lc <- c(-2, 1, 1, 0, -1, 1)
marginalmeans(mod, variables = "carb", hypothesis = lc)
#> term marginalmean std.error conf.low conf.high p.value statistic
#> 1 custom -0.2108696 6.928859 -13.79118 13.36945 0.9757213 -0.03043352
emmeans
produces similar results:
Users can also compute multiple linear combinations simultaneously by supplying a numeric matrix to hypothesis
. This matrix must have the same number of rows as the output of marginaleffects()
, and each column represents a distinct set of weights for different linear combinations. The column names of the matrix become labels in the output. For example:
lc <- matrix(c(
-2, 1, 1, 0, -1, 1,
1, -1, 0, 0, 0, 0
), ncol = 2)
colnames(lc) <- c("Contrast A", "Contrast B")
lc
#> Contrast A Contrast B
#> [1,] -2 1
#> [2,] 1 -1
#> [3,] 1 0
#> [4,] 0 0
#> [5,] -1 0
#> [6,] 1 0
marginalmeans(mod, variables = "carb", hypothesis = lc)
#> term marginalmean std.error conf.low conf.high p.value statistic
#> 1 Contrast A -0.2108696 6.928859 -13.791184 13.369445 0.9757213 -0.03043352
#> 2 Contrast B 0.3217391 1.773733 -3.154713 3.798191 0.8560607 0.18139099
In models with multinomial outcomes, one may be interested in comparing outcomes or contrasts across response levels. For example, in this model there are 18 estimated marginal means, across 6 outcome levels (the group
column):
library(nnet)
mod <- multinom(carb ~ mpg + cyl, data = dat, trace = FALSE)
mm <- marginalmeans(mod, type = "probs")
mm
#> group term value marginalmean std.error
#> 1 1 cyl 4 3.678521e-01 2.598299e-01
#> 2 2 cyl 4 6.310922e-01 2.600886e-01
#> 3 3 cyl 4 6.852625e-04 1.192096e-02
#> 4 4 cyl 4 2.118825e-04 1.750265e-02
#> 5 6 cyl 4 8.815299e-06 8.406582e-05
#> 6 8 cyl 4 1.496776e-04 7.970023e-03
#> 7 1 cyl 6 2.828726e-01 1.957050e-01
#> 8 2 cyl 6 1.848644e-06 2.394710e-06
#> 9 3 cyl 6 1.121687e-05 1.323445e-03
#> 10 4 cyl 6 5.562672e-01 2.184901e-01
#> 11 6 cyl 6 1.608378e-01 1.539716e-01
#> 12 8 cyl 6 9.291803e-06 7.982244e-04
#> 13 1 cyl 8 4.625030e-04 9.586613e-03
#> 14 2 cyl 8 6.654572e-01 3.745272e-01
#> 15 3 cyl 8 3.103763e-01 3.714795e-01
#> 16 4 cyl 8 9.582000e-03 2.289015e-02
#> 17 6 cyl 8 4.345753e-09 9.895429e-08
#> 18 8 cyl 8 1.412195e-02 4.673316e-02
Let’s contrast the marginal means in the first outcome level when cyl
equals 4 and 6. These marginal means are located in rows 1 and 7 respectively:
lc <- rep(0, nrow(mm))
lc[1] <- -1
lc[7] <- 1
marginalmeans(
mod,
type = "probs",
hypothesis = lc)
#> term marginalmean std.error
#> 1 custom -0.08497951 0.320746
This is indeed equal to the results we would have obtained manually:
2.828726e-01 - 3.678521e-01
#> [1] -0.0849795
Now let’s say we want to calculate a “contrast in contrasts”, that is, the outcome of a 3-step process:
cyl=6
and cyl=4
in the 1st outcome levelcyl=6
and cyl=4
in the 2nd outcome levelWe create the linear combination weights as follows:
To make sure that the weights are correct, we can display them side by side with the original marginalmeans()
output:
transform(mm[, 1:3], lc = lc)
#> group term value lc
#> 1 1 cyl 4 -1
#> 2 2 cyl 4 1
#> 3 3 cyl 4 0
#> 4 4 cyl 4 0
#> 5 6 cyl 4 0
#> 6 8 cyl 4 0
#> 7 1 cyl 6 1
#> 8 2 cyl 6 -1
#> 9 3 cyl 6 0
#> 10 4 cyl 6 0
#> 11 6 cyl 6 0
#> 12 8 cyl 6 0
#> 13 1 cyl 8 0
#> 14 2 cyl 8 0
#> 15 3 cyl 8 0
#> 16 4 cyl 8 0
#> 17 6 cyl 8 0
#> 18 8 cyl 8 0
Compute the results:
marginalmeans(mod, type = "probs", hypothesis = lc)
#> term marginalmean std.error
#> 1 custom 0.5461109 0.5498044
Now we illustrate how to use the machinery described above to do pairwise comparisons between contrasts, a type of analysis often associated with a “Difference-in-Differences” research design.
First, we simultate data with two treatment groups and pre/post periods:
library(data.table)
N <- 1000
did <- data.table(
id = 1:N,
pre = rnorm(N),
trt = sample(0:1, N, replace = TRUE))
did$post <- did$pre + did$trt * 0.3 + rnorm(N)
did <- melt(
did,
value.name = "y",
variable.name = "time",
id.vars = c("id", "trt"))
head(did)
#> id trt time y
#> 1: 1 1 pre 0.5941759
#> 2: 2 1 pre 0.4250376
#> 3: 3 0 pre 1.5833849
#> 4: 4 1 pre 0.8573142
#> 5: 5 1 pre -0.4107030
#> 6: 6 1 pre -1.5716398
Then, we estimate a linear model with a multiple interaction between the time and the treatment indicators. We alco compute contrasts at the mean for each treatment level:
did_model <- lm(y ~ time * trt, data = did)
comparisons(
did_model,
newdata = datagrid(trt = 0:1),
variables = "time")
#> rowid type term contrast comparison std.error statistic
#> 1 1 response time post - pre -0.004672243 0.07875075 -0.05932951
#> 2 2 response time post - pre 0.321646925 0.07906638 4.06806184
#> p.value conf.low conf.high predicted predicted_hi predicted_lo time
#> 1 0.9526896533 -0.1590209 0.1496764 0.001185225 -0.003487019 0.001185225 pre
#> 2 0.0000474058 0.1666797 0.4766142 -0.071434043 0.250212883 -0.071434043 pre
#> trt
#> 1 0
#> 2 1
Finally, we compute pairwise differences between contrasts. This is the Diff-in-Diff estimate:
comparisons(
did_model,
variables = "time",
newdata = datagrid(trt = 0:1),
hypothesis = "pairwise")
#> type term comparison std.error statistic p.value conf.low
#> 1 response Row 2 - Row 1 0.3263192 0.1115938 2.92417 0.003453763 0.1075994
#> conf.high
#> 1 0.545039