Hypothesis Tests

The marginaleffects package can conduct linear or non-linear hypothesis tests on the coefficients of any supported model class, or on the quantities generated by any of the other functions of the package: predictions(), comparisons(), or slopes().

There are two main entry points for hypothesis tests:

marginaleffects functions: Use the hypothesis argument.
Other R objects and models: Use the hypotheses() function.

Both the hypothesis argument and the hypotheses() function accept several input types, which allow a lot of flexibility in the specification of hypothesis tests:

Numeric: Value of a null hypothesis.
Formula: Equation of a (non-)linear hypothesis test.
Function: Tests on arbitrary transformations or aggregations.
Matrix: Contrast matrices and vectors.

This vignette shows how to use each of these strategies to conduct hypothesis tests on model coefficients or on the quantities estimated by the marginaleffects package. After reading it, you will be able to specify custom hypothesis tests and contrasts to assess statements like:

The coefficients \(\beta_1\) and \(\beta_2\) are equal.
The slope of \(Y\) with respect to \(X_1\) is equal to the slope with respect to \(X_2\).
The average treatment effect of \(X\) when \(W=0\) is equal to the average treatment effect of \(X\) when \(W=1\).
A non-linear function of predicted values is equal to 100.
The marginal mean in the control group is equal to the average of marginal means in the other 3 treatment arms.
Cross-level contrasts: In a multinomial model, the effect of \(X\) on the 1st outcome level is equal to the effect of \(X\) on the 2nd outcome level.

Numeric (null hypothesis)

The simplest way to modify a hypothesis test is to change the null hypothesis. By default, all functions in the marginaleffects package assume that the null is 0. This can be changed with the hypothesis argument.

Coefficients

Consider a simple logistic regression model:

library(marginaleffects)
mod <- glm(am ~ hp + drat, data = mtcars, family = binomial)

By default, the summary() function will report the results of hypothesis tests where the null is set to 0:

summary(mod)
#> 
#> Call:
#> glm(formula = am ~ hp + drat, family = binomial, data = mtcars)
#> 
#> Coefficients:
#>               Estimate Std. Error z value Pr(>|z|)  
#> (Intercept) -29.076080  12.416916  -2.342   0.0192 *
#> hp            0.010793   0.009328   1.157   0.2473  
#> drat          7.309781   3.046597   2.399   0.0164 *
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 43.230  on 31  degrees of freedom
#> Residual deviance: 20.144  on 29  degrees of freedom
#> AIC: 26.144
#> 
#> Number of Fisher Scoring iterations: 7

Using hypotheses(), we can easily change the null hypothesis:

hypotheses(mod, hypothesis = 6)

tinytable_455fb7x5lrlw4t1g5rxw

Term	Estimate	Std. Error	z	Pr(>\|z\|)	S	2.5 %	97.5 %
(Intercept)	-29.0761	12.41692	-2.82	0.00473	7.7	-53.41279	-4.7394
hp	0.0108	0.00933	-642.04	< 0.001	Inf	-0.00749	0.0291
drat	7.3098	3.04660	0.43	0.66726	0.6	1.33856	13.2810

Predictions

Changing the value of the null is particularly important in the context of predictions, where the 0 baseline may not be particularly meaningful. For example, here we compute the predicted outcome for a hypothetical unit where all regressors are fixed to their sample means:

predictions(mod, newdata = "mean")

tinytable_e1mzzmlwri0h69sj0zxj

Estimate	Pr(>\|z\|)	S	2.5 %	97.5 %	hp	drat
0.231	0.135	2.9	0.0584	0.592	147	3.6

The Z statistic and p value reported above assume that the null hypothesis equals zero. We can change the null with the hypothesis argument:

predictions(mod, newdata = "mean", hypothesis = .5)

tinytable_xvj9v2uiqyhcnfxslm3z

Estimate	Pr(>\|z\|)	S	2.5 %	97.5 %	hp	drat
0.231	0.0343	4.9	0.0584	0.592	147	3.6

Comparisons and slopes

When computing different quantities of interest like risk ratios, it can make sense to set the null hypothesis to 1 rather than 0:

avg_comparisons(
    mod,
    variables = "hp",
    comparison = "ratio",
    hypothesis = 1) |>
    print(digits = 5)

tinytable_t41lajijg86n1ejnca9r

Term	Contrast	Estimate	Std. Error	z	Pr(>\|z\|)	S	2.5 %	97.5 %
hp	+1	1.0064	0.005613	1.1457	0.25193	2.0	0.99543	1.0174

Formula

The hypotheses() function emulates the formula-based syntax of the well-established car::deltaMethod and car::linearHypothesis functions. However, marginaleffects supports more models, requires fewer dependencies, and offers some convenience features like the vcov argument for robust standard errors.

The syntax we illustrate in this section takes the form a string formula, ex: "am = 2 * vs"

Coefficients

Let’s start by estimating a simple model:

library(marginaleffects)
mod <- lm(mpg ~ hp + wt + factor(cyl), data = mtcars)

When the FUN and hypothesis arguments of hypotheses() equal NULL (the default), the function returns a data.frame of raw estimates:

hypotheses(mod)

tinytable_ejhzz3wbxqac402ki1c7

Term	Estimate	Std. Error	z	Pr(>\|z\|)	S	2.5 %	97.5 %
(Intercept)	35.8460	2.041	17.56	<0.001	227.0	31.8457	39.846319
hp	-0.0231	0.012	-1.93	0.0531	4.2	-0.0465	0.000306
wt	-3.1814	0.720	-4.42	<0.001	16.6	-4.5918	-1.771012
factor(cyl)6	-3.3590	1.402	-2.40	0.0166	5.9	-6.1062	-0.611803
factor(cyl)8	-3.1859	2.170	-1.47	0.1422	2.8	-7.4399	1.068169

Test of equality between coefficients:

hypotheses(mod, "hp = wt")

tinytable_so09btyv2o7h5559k2p8

Term	Estimate	Std. Error	z	Pr(>\|z\|)	S	2.5 %	97.5 %
hp = wt	3.16	0.72	4.39	<0.001	16.4	1.75	4.57

Non-linear function of coefficients

hypotheses(mod, "exp(hp + wt) = 0.1")

tinytable_mzziwc3k14p2hhqug0on

Term	Estimate	Std. Error	z	Pr(>\|z\|)	S	2.5 %	97.5 %
exp(hp + wt) = 0.1	-0.0594	0.0292	-2.04	0.0418	4.6	-0.117	-0.0022

The vcov argument behaves in the same was as in all the other marginaleffects functions, allowing us to easily compute robust standard errors:

hypotheses(mod, "hp = wt", vcov = "HC3")

tinytable_u1wb4j0722gonrkbqzck

Term	Estimate	Std. Error	z	Pr(>\|z\|)	S	2.5 %	97.5 %
hp = wt	3.16	0.805	3.92	<0.001	13.5	1.58	4.74

We can use shortcuts like b1, b2, ... to identify the position of each parameter in the output of FUN. For example, b2=b3 is equivalent to hp=wt because those term names appear in the 2nd and 3rd row when we call hypotheses(mod).

hypotheses(mod, "b2 = b3")

tinytable_9yeijw3ykkhkxsx338zv

Term	Estimate	Std. Error	z	Pr(>\|z\|)	S	2.5 %	97.5 %
b2 = b3	3.16	0.72	4.39	<0.001	16.4	1.75	4.57

hypotheses(mod, hypothesis = "b* / b3 = 1")

tinytable_okll474n795idunxab0p

Term	Estimate	Std. Error	z	Pr(>\|z\|)	S	2.5 %	97.5 %
b1 / b3 = 1	-12.26735	2.07340	-5.9165	<0.001	28.2	-16.33	-8.204
b2 / b3 = 1	-0.99273	0.00413	-240.5539	<0.001	Inf	-1.00	-0.985
b3 / b3 = 1	0.00000	NA	NA	NA	NA	NA	NA
b4 / b3 = 1	0.05583	0.58287	0.0958	0.924	0.1	-1.09	1.198
b5 / b3 = 1	0.00141	0.82981	0.0017	0.999	0.0	-1.62	1.628

Term names with special characters must be enclosed in backticks:

hypotheses(mod, "`factor(cyl)6` = `factor(cyl)8`")

tinytable_o48e2io48zhebcbcbfn8

Term	Estimate	Std. Error	z	Pr(>\|z\|)	S	2.5 %	97.5 %
`factor(cyl)6` = `factor(cyl)8`	-0.173	1.65	-0.105	0.917	0.1	-3.41	3.07

Predictions

Now consider the case of adjusted predictions:

mod <- lm(mpg ~ am + vs, data = mtcars)

p <- predictions(
    mod,
    newdata = datagrid(am = 0:1, vs = 0:1))
p

tinytable_pvh1pwxggmvg3ycluzdo

am	vs	Estimate	Std. Error	z	Pr(>\|z\|)	S	2.5 %	97.5 %
0	0	14.6	0.926	15.8	<0.001	183.4	12.8	16.4
0	1	21.5	1.130	19.0	<0.001	266.3	19.3	23.7
1	0	20.7	1.183	17.5	<0.001	224.5	18.3	23.0
1	1	27.6	1.130	24.4	<0.001	435.0	25.4	29.8

Since there is no term column in the output of the predictions function, we must use parameter identifiers like b1, b2, etc. to determine which estimates we want to compare:

hypotheses(p, hypothesis = "b1 = b2")

tinytable_4sbjh1sx859jnx1ge2yu

Term	Estimate	Std. Error	z	Pr(>\|z\|)	S	2.5 %	97.5 %
b1=b2	-6.93	1.26	-5.49	<0.001	24.6	-9.4	-4.46

Or directly:

predictions(
    mod,
    hypothesis = "b1 = b2",
    newdata = datagrid(am = 0:1, vs = 0:1))

tinytable_dgri9oxnjy4tvzhfeaxm

Term	Estimate	Std. Error	z	Pr(>\|z\|)	S	2.5 %	97.5 %
b1=b2	-6.93	1.26	-5.49	<0.001	24.6	-9.4	-4.46


p$estimate[1] - p$estimate[2]
#> [1] -6.929365

There are many more possibilities:

predictions(
    mod,
    hypothesis = "b1 + b2 = 30",
    newdata = datagrid(am = 0:1, vs = 0:1))

tinytable_wtkn6nuv6ch0na99q4ov

Term	Estimate	Std. Error	z	Pr(>\|z\|)	S	2.5 %	97.5 %
b1+b2=30	6.12	1.64	3.74	<0.001	12.4	2.91	9.32


p$estimate[1] + p$estimate[2] - 30
#> [1] 6.118254

predictions(
    mod,
    hypothesis = "(b2 - b1) / (b3 - b2) = 0",
    newdata = datagrid(am = 0:1, vs = 0:1))

tinytable_7bmtz7pw5fq1odnik5t3

Term	Estimate	Std. Error	z	Pr(>\|z\|)	S	2.5 %	97.5 %
(b2-b1)/(b3-b2)=0	-8.03	17	-0.473	0.636	0.7	-41.3	25.2

Comparisons and slopes

The avg_comparisons() function allows us to answer questions of this form: On average, how does the expected outcome change when I change one regressor by some amount? Consider this:

mod <- lm(mpg ~ am + vs, data = mtcars)

cmp <- avg_comparisons(mod)
cmp

tinytable_osotsm6tt9ccznl3z0kq

Term	Contrast	Estimate	Std. Error	z	Pr(>\|z\|)	S	2.5 %	97.5 %
am	1 - 0	6.07	1.27	4.76	<0.001	19.0	3.57	8.57
vs	1 - 0	6.93	1.26	5.49	<0.001	24.6	4.46	9.40

This tells us that, on average, moving from 0 to 1 on the am changes the predicted outcome by about 6.1, and changing .vs from 0 to 1 changes the predicted outcome by about 6.9.

Is the difference between those two estimates statistically significant? In other words, Is the effect of am equal to the effect of vs? To answer this question, we use the hypothesis argument:

avg_comparisons(mod, hypothesis = "am = vs")

tinytable_k1cmjgdpdz5mk429i7ar

Term	Estimate	Std. Error	z	Pr(>\|z\|)	S	2.5 %	97.5 %
am=vs	-0.863	1.94	-0.445	0.656	0.6	-4.66	2.94

The hypothesis string can include any valid R expression, so we can run some silly non-linear tests:

avg_comparisons(mod, hypothesis = "exp(am) - 2 * vs = -400")

tinytable_rw63pqooo57vylha0si6

Term	Estimate	Std. Error	z	Pr(>\|z\|)	S	2.5 %	97.5 %
exp(am)-2*vs=-400	817	550	1.49	0.137	2.9	-261	1896

Note that the p values and confidence intervals are calculated using the delta method and are thus based on the assumption that the hypotheses expression is approximately normally distributed. For (very) non-linear functions of the parameters, this is not realistic, and we get p values with incorrect error rates and confidence intervals with incorrect coverage probabilities. For such hypotheses, it’s better to calculate the confidence intervals using the bootstrap (see inferences for details):

While the confidence interval from the delta method is symmetric, equal to the estimate ± 1.96 times the standard error, the (perhaps) more reliable confidence interval from the bootstrap is highly skewed.

set.seed(1234)

avg_comparisons(mod, hypothesis = "exp(am) - 2 * vs = -400") |>
  inferences(method = "boot")

tinytable_pvhy28vkltox1h189gry

Term	Estimate	Std. Error	2.5 %	97.5 %
exp(am)-2*vs=-400	817	1854	414	6990

The same approach can be taken to compare slopes:

mod <- lm(mpg ~ qsec * hp, data = mtcars)

avg_slopes(mod, hypothesis = "10 * hp - qsec = 0")

tinytable_kqivzqhrm2a38iq36ttd

Term	Estimate	Std. Error	z	Pr(>\|z\|)	S	2.5 %	97.5 %
10*hp-qsec=0	0.262	0.353	0.742	0.458	1.1	-0.43	0.954

Functions

Hypothesis tests can also be conducted using arbitrary R functions. This allows users to test hypotheses on complex aggregations or transformations. To achieve this, we define a custom function which accepts a fitted model or marginaleffects objects, and returns a data frame with at least two columns: term (or hypothesis) and estimate.

Predictions

When supplying a function to the hypothesis argument, that function must accept an argument x which is a data frame with columns rowid and estimate (and optional columns for other elements of newdata). That function must return a data frame with columns term (or hypothesis) and estimate.

In this example, we test if the mean predicted value is different from 2:

dat <- transform(mtcars, gear = factor(gear), cyl = factor(cyl))

mod <- lm(wt ~ mpg * hp * cyl, data = dat)

hyp <- function(x) {
    data.frame(
        hypothesis = "Avg(Ŷ) = 2",
        estimate = mean(x$estimate) - 2
    )
}
predictions(mod, hypothesis = hyp)

tinytable_spsattaf1p7onzamymuu

Hypothesis	Estimate	Std. Error	z	Pr(>\|z\|)	S	2.5 %	97.5 %
Avg(Ŷ) = 2	1.22	0.0914	13.3	<0.001	132.0	1.04	1.4

In this ordinal logit model, the predictions() function returns one row per observation and per level of the outcome variable:

library(MASS)
library(dplyr)

mod <- polr(gear ~ cyl + hp, dat)

avg_predictions(mod)

tinytable_yyrt7c52jtsfqowzarvu

Group	Estimate	Std. Error	z	Pr(>\|z\|)	S	2.5 %	97.5 %
3	0.471	0.0584	8.05	<0.001	50.2	0.3561	0.585
4	0.366	0.0715	5.12	<0.001	21.7	0.2263	0.507
5	0.163	0.0478	3.41	<0.001	10.6	0.0692	0.257

We can use a function in the hypothesis argument to collapse the rows, displaying the average predicted values in groups 3-4 vs. 5:

fun <- function(x) {
    out <- x |> 
        mutate(term = ifelse(group %in% 3:4, "3 & 4", "5")) |>
        summarize(estimate = mean(estimate), .by = term)
    return(out)
}
avg_predictions(mod, hypothesis = fun)

tinytable_ljh3gizcsiu9i3a2qzy5

Term	Estimate	Std. Error	z	Pr(>\|z\|)	S	2.5 %	97.5 %
3 & 4	0.419	0.0239	17.51	<0.001	225.5	0.3717	0.465
5	0.163	0.0478	3.41	<0.001	10.6	0.0692	0.257

And we can compare the two categories by doing:

fun <- function(x) {
    out <- x |> 
        mutate(term = ifelse(group %in% 3:4, "3 & 4", "5")) |>
        summarize(estimate = mean(estimate), .by = term) |>
        summarize(estimate = diff(estimate), term = "5 - (3 & 4)")
    return(out)
}
avg_predictions(mod, hypothesis = fun)

tinytable_ziprdso5v9h7ztbrasc2

Term	Estimate	Std. Error	z	Pr(>\|z\|)	S	2.5 %	97.5 %
5 - (3 & 4)	-0.256	0.0717	-3.56	<0.001	11.4	-0.396	-0.115

Comparisons and slopes

In the same ordinal logit model, we can estimate the average effect of an increase of 1 unit in hp on the expected probability of every level of the outcome:

avg_comparisons(mod, variables = "hp")

tinytable_8bww6d94e1s3i0bxkgud

Group	Term	Contrast	Estimate	Std. Error	z	Pr(>\|z\|)	S	2.5 %	97.5 %
3	hp	+1	-0.00774	0.002294	-3.38	<0.001	10.4	-0.01224	-0.00325
4	hp	+1	0.00258	0.002201	1.17	0.24	2.1	-0.00173	0.00690
5	hp	+1	0.00516	0.000882	5.85	<0.001	27.6	0.00343	0.00689

Compare estimate for different outcome levels:

fun <- function(x) {
    x |> 
    mutate(estimate = (estimate - lag(estimate)),
           group = sprintf("%s - %s", group, lag(group))) |>
    filter(!is.na(estimate))
}
avg_comparisons(mod, variables = "hp", hypothesis = fun)

tinytable_8wdypaafd6k3ow8aivqj

Group	Term	Contrast	Estimate	Std. Error	z	Pr(>\|z\|)	S	2.5 %	97.5 %
4 - 3	hp	+1	0.01033	0.00441	2.34	0.0191	5.7	0.00169	0.01897
5 - 4	hp	+1	0.00258	0.00245	1.05	0.2922	1.8	-0.00222	0.00737

Now suppose we want to compare the effect of hp for different levels of the outcome, but this time we do the computation within levels of cyl:

avg_comparisons(mod, variables = "hp", by = "cyl")

tinytable_s3tm6p0qkhz0naicg3xf

Group	Term	Contrast	cyl	Estimate	Std. Error	z	Pr(>\|z\|)	S	2.5 %	97.5 %
3	hp	mean(+1)	4	-0.007862	0.003445	-2.282	0.02248	5.5	-1.46e-02	-0.00111
3	hp	mean(+1)	6	-0.013282	0.005038	-2.637	0.00838	6.9	-2.32e-02	-0.00341
3	hp	mean(+1)	8	-0.004882	0.001457	-3.350	< 0.001	10.3	-7.74e-03	-0.00203
4	hp	mean(+1)	4	-0.003024	0.003714	-0.814	0.41560	1.3	-1.03e-02	0.00426
4	hp	mean(+1)	6	0.008573	0.006431	1.333	0.18252	2.5	-4.03e-03	0.02118
4	hp	mean(+1)	8	0.003995	0.001627	2.455	0.01409	6.1	8.06e-04	0.00719
5	hp	mean(+1)	4	0.010886	0.002652	4.104	< 0.001	14.6	5.69e-03	0.01608
5	hp	mean(+1)	6	0.004709	0.001929	2.442	0.01462	6.1	9.29e-04	0.00849
5	hp	mean(+1)	8	0.000887	0.000431	2.059	0.03947	4.7	4.27e-05	0.00173

fun <- function(x) {
    x |> 
    mutate(estimate = (estimate - lag(estimate)),
           group = sprintf("%s - %s", group, lag(group)), 
           .by = "cyl") |>
    filter(!is.na(estimate))
}
avg_comparisons(mod, variables = "hp", by = "cyl", hypothesis = fun)

tinytable_kosx7i3h37tqswqef3pa

Group	Term	Contrast	cyl	Estimate	Std. Error	z	Pr(>\|z\|)	S	2.5 %	97.5 %
4 - 3	hp	mean(+1)	4	0.00484	0.00666	0.727	0.46719	1.1	-0.008205	0.017882
4 - 3	hp	mean(+1)	6	0.02185	0.01139	1.919	0.05503	4.2	-0.000471	0.044180
4 - 3	hp	mean(+1)	8	0.00888	0.00306	2.902	0.00371	8.1	0.002882	0.014874
5 - 4	hp	mean(+1)	4	0.01391	0.00546	2.548	0.01082	6.5	0.003211	0.024607
5 - 4	hp	mean(+1)	6	-0.00386	0.00805	-0.480	0.63117	0.7	-0.019638	0.011911
5 - 4	hp	mean(+1)	8	-0.00311	0.00188	-1.651	0.09869	3.3	-0.006798	0.000581

`specify_hypothesis()`

Supplying a function to the hypothesis argument is a powerful strategy, but it can sometimes be a bit tedious, because there is a lot of “boilerplate” code to write. specify_hypothesis() is an experimental function which handles a lot of the annoying work automatically for us: label creation, group-wise estimates, etc.

To begin, we estimate a simple model and make predictions on a grid wih 6 rows:

dat <- transform(mtcars, gear = factor(gear))
mod <- lm(hp ~ mpg * am * factor(cyl), data = dat)

nd <- datagrid(am = 0:1, cyl = sort(unique(dat$cyl)), model = mod)

predictions(mod, newdata = nd)

tinytable_heqqta2fmufzlpz07jls

am	cyl	Estimate	Std. Error	z	Pr(>\|z\|)	S	2.5 %	97.5 %	mpg
0	4	119	36.1	3.306	<0.001	10.0	48.6	190	20.1
0	6	114	14.7	7.726	<0.001	46.4	85.0	143	20.1
0	8	163	15.1	10.842	<0.001	88.6	133.9	193	20.1
1	4	105	18.5	5.681	<0.001	26.2	68.9	141	20.1
1	6	155	17.8	8.735	<0.001	58.5	120.6	190	20.1
1	8	-117	202.6	-0.576	0.564	0.8	-513.9	280	20.1

By default, specify_hypothesis() computes a reference hypothesis, that is, the difference between each row and the first row in the output (rowid[1]):

hyp <- specify_hypothesis()
predictions(mod, newdata = nd, hypothesis = hyp)

tinytable_2og66tf1fizlljm2l6yp

Hypothesis	Estimate	Std. Error	z	Pr(>\|z\|)	S	2.5 %	97.5 %
(rowid[2], am[0], cyl[6]) - (rowid[1], am[0], cyl[4])	-5.58	39.0	-0.143	0.886	0.2	-82.1	70.9
(rowid[3], am[0], cyl[8]) - (rowid[1], am[0], cyl[4])	43.96	39.2	1.123	0.262	1.9	-32.8	120.7
(rowid[4], am[1], cyl[4]) - (rowid[1], am[0], cyl[4])	-14.32	40.6	-0.353	0.724	0.5	-93.9	65.3
(rowid[5], am[1], cyl[6]) - (rowid[1], am[0], cyl[4])	35.98	40.3	0.893	0.372	1.4	-43.0	114.9
(rowid[6], am[1], cyl[8]) - (rowid[1], am[0], cyl[4])	-236.28	205.8	-1.148	0.251	2.0	-639.7	167.1

Alternatively, we can specify a "sequential" hypothesis to compare each row to its previous one:

hyp <- specify_hypothesis(hypothesis = "sequential")
predictions(mod, newdata = nd, hypothesis = hyp)

tinytable_7ie1jxjfkbgx588zk6nd

Hypothesis	Estimate	Std. Error	z	Pr(>\|z\|)	S	2.5 %	97.5 %
(rowid[2], am[0], cyl[6]) - (rowid[1], am[0], cyl[4])	-5.58	39.0	-0.143	0.8863	0.2	-82.0816	70.9
(rowid[3], am[0], cyl[8]) - (rowid[2], am[0], cyl[6])	49.54	21.1	2.349	0.0188	5.7	8.2139	90.9
(rowid[4], am[1], cyl[4]) - (rowid[3], am[0], cyl[8])	-58.29	23.9	-2.442	0.0146	6.1	-105.0756	-11.5
(rowid[5], am[1], cyl[6]) - (rowid[4], am[1], cyl[4])	50.31	25.7	1.959	0.0501	4.3	-0.0202	100.6
(rowid[6], am[1], cyl[8]) - (rowid[5], am[1], cyl[6])	-272.26	203.4	-1.339	0.1807	2.5	-670.8987	126.4

A more powerful customization option is to supply one’s own comparison function, along with a function to create labels. Compare each row to the global mean:

hyp <- specify_hypothesis(
    hypothesis = \(x) x - mean(x),
    label = \(x) sprintf("%s - ȳ", x),
    label_columns = "rowid"
)
predictions(mod, newdata = nd, hypothesis = hyp)

tinytable_1eqo5sdhwn495hzyjcvo

Hypothesis	am	cyl	Estimate	Std. Error	z	Pr(>\|z\|)	S	2.5 %	97.5 %	mpg
rowid[1] - ȳ	0	4	29.4	45.6	0.644	0.5194	0.9	-59.97	118.7	20.1
rowid[2] - ȳ	0	6	23.8	36.8	0.647	0.5176	1.0	-48.28	95.9	20.1
rowid[3] - ȳ	0	8	73.3	36.9	1.990	0.0466	4.4	1.09	145.6	20.1
rowid[4] - ȳ	1	4	15.0	37.9	0.397	0.6913	0.5	-59.22	89.3	20.1
rowid[5] - ȳ	1	6	65.4	37.7	1.735	0.0827	3.6	-8.46	139.2	20.1
rowid[6] - ȳ	1	8	-206.9	169.0	-1.224	0.2210	2.2	-538.22	124.4	20.1

The ratio of each row to the first:

hyp <- specify_hypothesis(
    hypothesis =  \(x) x / x[1],
    label = \(x) sprintf("(%s) / (%s)", x, x[1])
)
predictions(mod, newdata = nd, hypothesis = hyp)

tinytable_txkzar8sm7ouv6v5fa96

Hypothesis	am	cyl	Estimate	Std. Error	z	Pr(>\|z\|)	S	2.5 %	97.5 %	mpg
(rowid[1], am[0], cyl[4]) / (rowid[1], am[0], cyl[4])	0	4	1.000	NA	NA	NA	NA	NA	NA	20.1
(rowid[2], am[0], cyl[6]) / (rowid[1], am[0], cyl[4])	0	6	0.953	0.314	3.039	0.00237	8.7	0.339	1.57	20.1
(rowid[3], am[0], cyl[8]) / (rowid[1], am[0], cyl[4])	0	8	1.368	0.433	3.162	0.00157	9.3	0.520	2.22	20.1
(rowid[4], am[1], cyl[4]) / (rowid[1], am[0], cyl[4])	1	4	0.880	0.308	2.857	0.00427	7.9	0.276	1.48	20.1
(rowid[5], am[1], cyl[6]) / (rowid[1], am[0], cyl[4])	1	6	1.301	0.421	3.092	0.00199	9.0	0.476	2.13	20.1
(rowid[6], am[1], cyl[8]) / (rowid[1], am[0], cyl[4])	1	8	-0.977	1.721	-0.568	0.57012	0.8	-4.351	2.40	20.1

We can use the by argument to specify subgroups in which to apply the function. For example, we may want to compute the ratio of each row to the reference (first) row of each subgroup of am. Notice that the denomiator rowid changes depending on the am subgroup:

hyp <- specify_hypothesis(
    by = "am",
    hypothesis =  \(x) x / x[1],
    label = \(x) sprintf("(%s) / (%s)", x, x[1])
)
predictions(mod, newdata = nd, hypothesis = hyp)

tinytable_yunck0sdyzhjb9soaq2f

Hypothesis	am	cyl	Estimate	Std. Error	z	Pr(>\|z\|)	S	2.5 %	97.5 %	mpg
(rowid[1], cyl[4]) / (rowid[1], cyl[4])	0	4	1.000	NA	NA	NA	NA	NA	NA	20.1
(rowid[2], cyl[6]) / (rowid[1], cyl[4])	0	6	0.953	0.314	3.039	0.00237	8.7	0.339	1.57	20.1
(rowid[3], cyl[8]) / (rowid[1], cyl[4])	0	8	1.368	0.433	3.162	0.00157	9.3	0.520	2.22	20.1
(rowid[4], cyl[4]) / (rowid[4], cyl[4])	1	4	1.000	NA	NA	NA	NA	NA	NA	20.1
(rowid[5], cyl[6]) / (rowid[4], cyl[4])	1	6	1.478	0.310	4.763	< 0.001	19.0	0.870	2.09	20.1
(rowid[6], cyl[8]) / (rowid[4], cyl[4])	1	8	-1.111	1.937	-0.573	0.56631	0.8	-4.906	2.68	20.1

`term`, `contrast`, and `group`

By default, specify_hypothesis() will always apply comparison functions within subgroups of the term, contrast, and group columns. For example:

library(MASS)
dat <- transform(mtcars, gear = factor(gear), cyl = factor(cyl))
mod <- polr(gear ~ cyl + hp, dat, Hess = TRUE)

avg_predictions(mod, by = "am")

tinytable_vaunpw2zc51hqwbavuza

Group	am	Estimate	Std. Error	z	Pr(>\|z\|)	S	2.5 %	97.5 %
3	1	0.1877	0.0709	2.65	0.00811	6.9	0.04874	0.327
3	0	0.6642	0.0575	11.56	< 0.001	100.3	0.55163	0.777
4	1	0.4984	0.0982	5.08	< 0.001	21.3	0.30597	0.691
4	0	0.2761	0.0617	4.48	< 0.001	17.0	0.15523	0.397
5	1	0.3138	0.0791	3.97	< 0.001	13.7	0.15871	0.469
5	0	0.0597	0.0289	2.06	0.03907	4.7	0.00299	0.116


hyp <- specify_hypothesis()
avg_predictions(mod, by = "am", hypothesis = hyp)

tinytable_8j08j0d81qfkcs3fghf5

Group	Hypothesis	Estimate	Std. Error	z	Pr(>\|z\|)	S	2.5 %	97.5 %
3	(rowid[2], am[0]) - (rowid[1], am[1])	0.477	0.0494	9.65	<0.001	70.7	0.380	0.5733
4	(rowid[4], am[0]) - (rowid[3], am[1])	-0.222	0.0664	-3.35	<0.001	10.3	-0.352	-0.0923
5	(rowid[6], am[0]) - (rowid[5], am[1])	-0.254	0.0559	-4.54	<0.001	17.5	-0.364	-0.1446

To avoid applying the function within group and compare group levels to one another, we can specify by="rowid":

avg_predictions(mod)

tinytable_vzyfsx3gbn8zj15joaw9

Group	Estimate	Std. Error	z	Pr(>\|z\|)	S	2.5 %	97.5 %
3	0.471	0.0584	8.05	<0.001	50.2	0.3561	0.585
4	0.366	0.0715	5.12	<0.001	21.7	0.2263	0.507
5	0.163	0.0478	3.41	<0.001	10.6	0.0692	0.257


hyp <- specify_hypothesis(
    by = "rowid",
    hypothesis = "sequential",
    label_columns = "group"
)
avg_predictions(mod, hypothesis = hyp)

tinytable_g30p6wihj4fhs8tuouvt

Hypothesis	Estimate	Std. Error	z	Pr(>\|z\|)	S	2.5 %	97.5 %
(group[4]) - (group[3])	-0.104	0.122	-0.858	0.3911	1.4	-0.342	0.13396
(group[5]) - (group[4])	-0.204	0.107	-1.907	0.0565	4.1	-0.413	0.00564

Fitted models

The hypothesis argument can be used to compute standard errors for arbitrary functions of model parameters. This user-supplied function must accept a single model object, and return a data.frame with two columns named term and estimate.

Here, we test if the sum of the hp and mpg coefficients is equal to 2:

mod <- glm(am ~ hp + mpg, data = mtcars, family = binomial)

fun <- function(x) {
    b <- coef(x)
    out <- data.frame(
        term = "hp + mpg = 2",
        estimate = b["hp"] + b["mpg"] - 2,
        row.names = NULL
    )
    return(out)
}

hypotheses(mod, hypothesis = fun)

tinytable_69qxmrotobc09aoglm5y

Term	Estimate	Std. Error	z	Pr(>\|z\|)	S	2.5 %	97.5 %
hp + mpg = 2	-0.685	0.593	-1.16	0.248	2.0	-1.85	0.476

Test of equality between two two predictions:

fun <- function(x) {
    p <- predict(x, newdata = mtcars)
    out <- data.frame(term = "pred[2] = pred[3]", estimate = p[2] - p[3])
    return(out)
}
hypotheses(mod, hypothesis = fun)

tinytable_01li0kuh7rnesu3x7idb

Term	Estimate	Std. Error	z	Pr(>\|z\|)	S	2.5 %	97.5 %
pred[2] = pred[3]	-1.33	0.616	-2.16	0.0305	5.0	-2.54	-0.125

We can also use more complex aggregation patterns. In this ordinal logistic regression model, we model the number of gears for each ar. If we compute fitted values with the predictions() function, we obtain one predicted probability for each individual car and for each level of the response variable:

library(MASS)
library(dplyr)

dat <- transform(mtcars, 
    gear = factor(gear),
    cyl = factor(cyl))
mod <- polr(gear ~ cyl + hp, dat)

predictions(mod)

tinytable_rkvvur8sghz6myokqlz8

Group	Estimate	Std. Error	z	Pr(>\|z\|)	S	2.5 %	97.5 %
3	0.3931	0.19125	2.06	0.03982	4.7	0.0183	0.768
3	0.3931	0.19125	2.06	0.03982	4.7	0.0183	0.768
3	0.0440	0.04256	1.03	0.30081	1.7	-0.0394	0.127
3	0.3931	0.19125	2.06	0.03982	4.7	0.0183	0.768
3	0.9963	0.00721	138.17	< 0.001	Inf	0.9822	1.010
5	0.6969	0.18931	3.68	< 0.001	12.1	0.3258	1.068
5	0.0555	0.06851	0.81	0.41775	1.3	-0.0788	0.190
5	0.8115	0.20626	3.93	< 0.001	13.5	0.4073	1.216
5	0.9111	0.16818	5.42	< 0.001	24.0	0.5815	1.241
5	0.6322	0.19648	3.22	0.00129	9.6	0.2471	1.017

There are three levels to the outcome: 3, 4, and 5. Imagine that, for each car in the dataset, we want to collapse categories of the output variable into two categories (“3 & 4” and “5”) by taking sums of predicted probabilities. Then, we want to take the average of those predicted probabilities for each level of cyl. To do so, we define a custom function, and pass it to the hypothesis argument of the hypotheses() function:

fun <- function(x) {
    predictions(x, vcov = FALSE) |>
        # label the new categories of outcome levels
        mutate(group = ifelse(group %in% c("3", "4"), "3 & 4", "5")) |>
        # sum of probabilities at the individual level
        summarize(estimate = sum(estimate), .by = c("rowid", "cyl", "group")) |>
        # average probabilities for each value of `cyl`
        summarize(estimate = mean(estimate), .by = c("cyl", "group")) |>
        # the `FUN` argument requires a `term` column
        rename(term = cyl)
}

hypotheses(mod, hypothesis = fun)

tinytable_w1f7awka2itlih0gdahq

Group	Term	Estimate	Std. Error	z	Pr(>\|z\|)	S	2.5 %	97.5 %
3 & 4	6	0.8390	0.0651	12.89	<0.001	123.9	0.7115	0.967
3 & 4	4	0.7197	0.1099	6.55	<0.001	34.0	0.5044	0.935
3 & 4	8	0.9283	0.0174	53.45	<0.001	Inf	0.8943	0.962
5	6	0.1610	0.0651	2.47	0.0134	6.2	0.0334	0.289
5	4	0.2803	0.1099	2.55	0.0108	6.5	0.0649	0.496
5	8	0.0717	0.0174	4.13	<0.001	14.7	0.0377	0.106

Note that this workflow will not work for bayesian models or with bootstrap. However, with those models it is trivial to do the same kind of aggregation by calling posterior_draws() and operating directly on draws from the posterior distribution. See the vignette on bayesian analysis for examples with the posterior_draws() function.

Vectors and Matrices

The predictions() function can estimate marginal means. The hypothesis argument of that function offers a powerful mechanism to estimate custom contrasts between marginal means, by way of linear combination.

Simple contrast

Consider a simple example:

library(marginaleffects)
library(emmeans)
library(nnet)

dat <- mtcars
dat$carb <- factor(dat$carb)
dat$cyl <- factor(dat$cyl)
dat$am <- as.logical(dat$am)

mod <- lm(mpg ~ carb + cyl, dat)
mm <- predictions(mod,
    by = "carb",
    newdata = datagrid(grid_type = "balanced"))
mm

tinytable_sc8vta04t6rug42nulgg

carb	Estimate	Std. Error	z	Pr(>\|z\|)	S	2.5 %	97.5 %
1	21.7	1.44	15.06	<0.001	167.8	18.8	24.5
2	21.3	1.23	17.29	<0.001	220.0	18.9	23.8
3	21.4	2.19	9.77	<0.001	72.5	17.1	25.7
4	18.9	1.21	15.59	<0.001	179.7	16.5	21.3
6	19.8	3.55	5.56	<0.001	25.2	12.8	26.7
8	20.1	3.51	5.73	<0.001	26.6	13.2	27.0

The contrast between marginal means for carb==1 and carb==2 is:

21.66232 - 21.34058 
#> [1] 0.32174

21.66232 + -(21.34058)
#> [1] 0.32174

sum(c(21.66232, 21.34058) * c(1, -1))
#> [1] 0.32174

c(21.66232, 21.34058) %*% c(1, -1)
#>         [,1]
#> [1,] 0.32174

The last two commands express the contrast of interest as a linear combination of marginal means.

In the predictions() function, we can supply a hypothesis argument to compute linear combinations of marginal means. This argument must be a numeric vector of the same length as the number of rows in the output. For example, in the previous there were six rows, and the two marginal means we want to compare are at in the first two positions:

lc <- c(1, -1, 0, 0, 0, 0)
predictions(mod,
    by = "carb",
    newdata = datagrid(grid_type = "balanced"),
    hypothesis = lc)

tinytable_215h5cdvym22le938d34

Term	Estimate	Std. Error	z	Pr(>\|z\|)	S	2.5 %	97.5 %
custom	0.322	1.77	0.181	0.856	0.2	-3.15	3.8

Complex contrast

Of course, we can also estimate more complex contrasts:

lc <- c(0, -2, 1, 1, -1, 1)
predictions(mod,
    by = "carb",
    newdata = datagrid(grid_type = "balanced"),
    hypothesis = lc)

tinytable_xh870kfzbrdc9nhy4mw2

Term	Estimate	Std. Error	z	Pr(>\|z\|)	S	2.5 %	97.5 %
custom	-2.02	6.32	-0.32	0.749	0.4	-14.4	10.4

emmeans produces similar results:

library(emmeans)
em <- emmeans(mod, "carb")
lc <- data.frame(custom_contrast = c(0, -2, 1, 1, -1, 1))
contrast(em, method = lc)
#>  contrast        estimate   SE df t.ratio p.value
#>  custom_contrast    -2.02 6.32 24  -0.320  0.7516
#> 
#> Results are averaged over the levels of: cyl

Multiple contrasts

Users can also compute multiple linear combinations simultaneously by supplying a numeric matrix to hypotheses. This matrix must have the same number of rows as the output of slopes(), and each column represents a distinct set of weights for different linear combinations. The column names of the matrix become labels in the output. For example:

lc <- matrix(c(
    -2, 1, 1, 0, -1, 1,
    1, -1, 0, 0, 0, 0
    ), ncol = 2)
colnames(lc) <- c("Contrast A", "Contrast B")
lc
#>      Contrast A Contrast B
#> [1,]         -2          1
#> [2,]          1         -1
#> [3,]          1          0
#> [4,]          0          0
#> [5,]         -1          0
#> [6,]          1          0

predictions(mod,
    by = "carb",
    newdata = datagrid(grid_type = "balanced"),
    hypothesis = lc)

tinytable_zeg2impi4ys028ounuiy

Term	Estimate	Std. Error	z	Pr(>\|z\|)	S	2.5 %	97.5 %
Contrast A	-0.211	6.93	-0.0304	0.976	0.0	-13.79	13.4
Contrast B	0.322	1.77	0.1814	0.856	0.2	-3.15	3.8

Arbitrary quantities

marginaleffects can also compute uncertainty estimates for arbitrary quantities hosted in a data frame, as long as the user can supply a variance-covariance matrix. (Thanks to Kyle F Butts for this cool feature and example!)

Say you run a monte-carlo simulation and you want to perform hypothesis of various quantities returned from each simulation. The quantities are correlated within each draw:

# simulated means and medians
draw <- function(i) { 
  x <- rnorm(n = 10000, mean = 0, sd = 1)
  out <- data.frame(median = median(x), mean =  mean(x))
  return(out)
}
sims <- do.call("rbind", lapply(1:25, draw))

# average mean and average median 
coeftable <- data.frame(
  term = c("median", "mean"),
  estimate = c(mean(sims$median), mean(sims$mean))
)

# variance-covariance
vcov <- cov(sims)

# is the median equal to the mean?
hypotheses(
  coeftable,
  vcov = vcov,
  hypothesis = "median = mean"
)

tinytable_3cctcon9zh96f6x099cj

Term	Estimate	Std. Error	z	Pr(>\|z\|)	S	2.5 %	97.5 %
median = mean	-0.00134	0.00576	-0.232	0.816	0.3	-0.0126	0.00996

Joint hypotheses tests

The hypotheses() function can also test multiple hypotheses jointly. For example, consider this model:

model <- lm(mpg ~ as.factor(cyl) * hp, data = mtcars)
coef(model)
#>        (Intercept)    as.factor(cyl)6    as.factor(cyl)8                 hp as.factor(cyl)6:hp as.factor(cyl)8:hp 
#>        35.98302564       -15.30917451       -17.90295193        -0.11277589         0.10516262         0.09853177

We may want to test the null hypothesis that two of the coefficients are jointly (both) equal to zero.

hypotheses(model, joint = c("as.factor(cyl)6:hp", "as.factor(cyl)8:hp"))

tinytable_2ks91taje8pyie3ttkuh

F	Pr(>\|F\|)	Df 1	Df 2
2.11	0.142	2	26

The joint argument allows users to flexibly specify the parameters to be tested, using character vectors, integer indices, or Perl-compatible regular expressions. We can also specify the null hypothesis for each parameter individually using the hypothesis argument.

Naturally, the hypotheses function also works with marginaleffects objects.

# ## joint hypotheses: regular expression
hypotheses(model, joint = "cyl")

tinytable_i3vwvhysq5se3w62enn4

F	Pr(>\|F\|)	Df 1	Df 2
5.7	0.00197	4	26


# joint hypotheses: integer indices
hypotheses(model, joint = 2:3)

tinytable_pzjrf9isagndlqb7w0hr

F	Pr(>\|F\|)	Df 1	Df 2
6.12	0.00665	2	26


# joint hypotheses: different null hypotheses
hypotheses(model, joint = 2:3, hypothesis = 1)

tinytable_i9cr0c4tg1sq7z4siki9

F	Pr(>\|F\|)	Df 1	Df 2
6.84	0.00411	2	26

hypotheses(model, joint = 2:3, hypothesis = 1:2)

tinytable_l4ebv8l0q1vwfc3p9tse

F	Pr(>\|F\|)	Df 1	Df 2
7.47	0.00273	2	26


# joint hypotheses: marginaleffects object
cmp <- avg_comparisons(model)
hypotheses(cmp, joint = "cyl")

tinytable_1i9wuptrvvygduz1tb57

F	Pr(>\|F\|)	Df 1	Df 2
1.6	0.221	2	26

We can also combine multiple calls to hypotheses to execute a joint test on linear combinations of coefficients:

# fit model
mod <- lm(mpg ~ factor(carb), mtcars)

# hypothesis matrix for linear combinations
H <- matrix(0, nrow = length(coef(mod)), ncol = 2)
H[2:3, 1] <- H[4:6, 2] <- 1

# test individual linear combinations
hyp <- hypotheses(mod, hypothesis = H)
hyp

tinytable_egw48nc4oss16374njeo

Term	Estimate	Std. Error	z	Pr(>\|z\|)	S	2.5 %	97.5 %
custom	-12.0	4.92	-2.44	0.01477	6.1	-21.6	-2.35
custom	-25.5	9.03	-2.83	0.00466	7.7	-43.2	-7.85


# test joint hypotheses
#hypotheses(hyp, joint = TRUE, hypothesis = c(-10, -20))

Difference-in-Differences

Now we illustrate how to use the machinery described above to do pairwise comparisons between contrasts, a type of analysis often associated with a “Difference-in-Differences” research design.

First, we simulate data with two treatment groups and pre/post periods:

library(data.table)

N <- 1000
did <- data.table(
    id = 1:N,
    pre = rnorm(N),
    trt = sample(0:1, N, replace = TRUE))
did$post <- did$pre + did$trt * 0.3 + rnorm(N)
did <- melt(
    did,
    value.name = "y",
    variable.name = "time",
    id.vars = c("id", "trt"))
head(did)
#>       id   trt   time          y
#>    <int> <int> <fctr>      <num>
#> 1:     1     1    pre -0.0790894
#> 2:     2     1    pre -0.3289587
#> 3:     3     0    pre -1.8710657
#> 4:     4     0    pre -1.1760871
#> 5:     5     0    pre -0.3169719
#> 6:     6     0    pre  0.6995667

Then, we estimate a linear model with a multiple interaction between the time and the treatment indicators. We also compute contrasts at the mean for each treatment level:

did_model <- lm(y ~ time * trt, data = did)

comparisons(
    did_model,
    newdata = datagrid(trt = 0:1),
    variables = "time")

tinytable_ik13up377ifwt81fxi1n

Term	Contrast	trt	Estimate	Std. Error	z	Pr(>\|z\|)	S	2.5 %	97.5 %	time
time	post - pre	0	0.0565	0.0800	0.707	0.4798	1.1	-0.1003	0.213	pre
time	post - pre	1	0.1995	0.0776	2.570	0.0102	6.6	0.0474	0.352	pre

Finally, we compute pairwise differences between contrasts. This is the Diff-in-Diff estimate:

comparisons(
    did_model,
    variables = "time",
    newdata = datagrid(trt = 0:1),
    hypothesis = "pairwise")

tinytable_okn0k3ilsqqjc98fcrwq

Term	Estimate	Std. Error	z	Pr(>\|z\|)	S	2.5 %	97.5 %
Row 1 - Row 2	-0.143	0.111	-1.28	0.2	2.3	-0.361	0.0755

Numeric (null hypothesis)

Coefficients

Predictions

Comparisons and slopes

Formula

Coefficients

Predictions

Comparisons and slopes

Functions

Predictions

Comparisons and slopes

specify_hypothesis()

term, contrast, and group

Fitted models

Vectors and Matrices

Simple contrast

Complex contrast

Multiple contrasts

Arbitrary quantities

Joint hypotheses tests

More

Difference-in-Differences

`specify_hypothesis()`

`term`, `contrast`, and `group`