If you do not like marginaleffects
, you may want to
consider one of the alternatives described below:
-
margins
: https://cran.r-project.org/web/packages/margins/index.html -
prediction
: https://cran.r-project.org/web/packages/prediction/index.html -
emmeans
: https://cran.r-project.org/web/packages/emmeans/index.html -
brmsmargins
: https://joshuawiley.com/brmsmargins/ -
effects
: https://cran.r-project.org/web/packages/effects/index.html -
modelbased
: https://easystats.github.io/modelbased/ -
ggeffects
: https://strengejacke.github.io/ggeffects/ -
Stata
by StataCorp LLC
emmeans
The emmeans
package is developed by Russell V. Lenth and colleagues.
emmeans
is a truly incredible piece of software, and a
trailblazer in the R
ecosystem. It is an extremely
powerful package whose functionality overlaps
marginaleffects
to a significant degree: marginal means,
contrasts, and slopes. Even if the two packages can compute many of the
same quantities, emmeans
and marginaleffects
have pretty different philosophies with respect to user interface and
computation.
An emmeans
analysis typically starts by computing “marginal
means” by holding all numeric covariates at their means, and by
averaging across a balanced grid of categorical predictors. Then, users
can use the contrast()
function to estimate the difference
between marginal means.
The marginaleffects
package supplies a
marginal_means
function which can replicate most
emmeans
analyses by computing marginal means. However, the
typical analysis is more squarely centered on predicted/fitted values.
This is a useful starting point because, in many cases, analysts will
find it easy and intuitive to express their scientific queries in terms
of changes in predicted values. For example,
- How does the average predicted probability of survival differ between treatment and control group?
- What is the difference between the predicted wage of college and high school graduates?
Let’s say we estimate a linear regression model with two continuous regressors and a multiplicative interaction:
\[y = \beta_0 + \beta_1 x + \beta_2 z + \beta_3 x \cdot z + \varepsilon\]
In this model, the effect of \(x\)
on \(y\) will depend on the value of
covariate \(z\). Let’s say the user
wants to estimate what happens to the predicted value of \(y\) when \(x\) increases by 1 unit, when \(z \in \{-1, 0, 1\}\). To do this, we use
the comparisons()
function. The variables
argument determines the scientific query of interest, and the
newdata
argument determines the grid of covariate values on
which we want to evaluate the query:
model <- lm(y ~ x * z, data)
comparisons(
model,
variables = list(x = 1), # what is the effect of 1-unit change in x?
newdata = datagrid(z = -1:1) # when z is held at values -1, 0, or 1
)
As the
vignettes show, marginaleffects
can also compute
contrasts on marginal means. It can also compute various quantities of
interest like raw fitted values, slopes (partial derivatives), and
contrasts between marginal means. It also offers a flexible mechanism to
run (non-)linear hypothesis tests using the delta method, and it offers
fully customizable strategy to compute quantities like odds ratios (or
completely arbitrary functions of predicted outcome).
Thus, in my (Vincent’s) biased opinion, the main benefits of
marginaleffects
over emmeans
are:
- Support more model types.
- Simpler, more intuitive, and highly consistent user interface.
- Easier to compute average slopes or unit-level contrasts for whole datasets.
- Easier to compute slopes (aka marginal effects, trends, or partial derivatives) for custom grids and continuous regressors.
- Easier to implement causal inference strategies like the parametric g-formula and regression adjustment in experiments (see vignettes).
- Allows the computation of arbitrary quantities of interest via user-supplied functions and automatic delta method inference.
- Common plots are easy with the
plot_predictions()
,plot_comparisons()
, andplot_slopes()
functions.
To be fair, many of the marginaleffects
advantages
listed above come down to subjective preferences over user interface.
Readers are thus encouraged to try both packages to see which interface
they prefer.
AFAICT, the main advantages of emmeans
over
marginaleffects
are:
- Omnibus tests.
- Multiplicity adjustments.
- Marginal means (but not slopes or contrasts) are often faster to compute.
Please let me know if you find other features in emmeans
so I can add them to this list.
The Marginal
Means Vignette includes side-by-side comparisons of
emmeans
and marginaleffects
to compute
marginal means. The rest of this section compares the syntax for
contrasts and marginaleffects.
Contrasts
As far as I can tell, emmeans
does not provide an easy
way to compute unit-level contrasts for every row of the dataset used to
fit our model. Therefore, the side-by-side syntax shown below will
always include newdata=datagrid()
to specify that we want
to compute only one contrast: at the mean values of the regressors. In
day-to-day practice with slopes()
, however, this extra
argument would not be necessary.
Fit a model:
library(emmeans)
library(marginaleffects)
mod <- glm(vs ~ hp + factor(cyl), data = mtcars, family = binomial)
Link scale, pairwise contrasts:
emm <- emmeans(mod, specs = "cyl")
contrast(emm, method = "revpairwise", adjust = "none", df = Inf)
#> contrast estimate SE df z.ratio p.value
#> cyl6 - cyl4 -0.905 1.63 Inf -0.555 0.5789
#> cyl8 - cyl4 -19.542 4367.17 Inf -0.004 0.9964
#> cyl8 - cyl6 -18.637 4367.16 Inf -0.004 0.9966
#>
#> Degrees-of-freedom method: user-specified
#> Results are given on the log odds ratio (not the response) scale.
comparisons(mod,
type = "link",
newdata = "mean",
variables = list(cyl = "pairwise"))
#>
#> Term Contrast Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
#> cyl 6 - 4 -0.905 1.63 -0.55506 0.579 -4.1 2.29
#> cyl 8 - 4 -19.542 4367.17 -0.00447 0.996 -8579.0 8539.95
#> cyl 8 - 6 -18.637 4367.17 -0.00427 0.997 -8578.1 8540.85
#>
#> Columns: rowid, term, contrast, estimate, std.error, statistic, p.value, conf.low, conf.high, predicted, predicted_hi, predicted_lo, vs, hp, cyl
Response scale, reference groups:
emm <- emmeans(mod, specs = "cyl", regrid = "response")
contrast(emm, method = "trt.vs.ctrl1", adjust = "none", df = Inf, ratios = FALSE)
#> contrast estimate SE df z.ratio p.value
#> cyl6 - cyl4 -0.222 0.394 Inf -0.564 0.5727
#> cyl8 - cyl4 -0.595 0.511 Inf -1.163 0.2447
#>
#> Degrees-of-freedom method: user-specified
comparisons(mod, newdata = "mean")
#>
#> Term Contrast Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
#> hp +1 -1.56e-10 6.80e-07 -0.000229 1.000 -1.33e-06 1.33e-06
#> cyl 6 - 4 -2.22e-01 3.94e-01 -0.564207 0.573 -9.94e-01 5.50e-01
#> cyl 8 - 4 -5.95e-01 5.11e-01 -1.163438 0.245 -1.60e+00 4.07e-01
#>
#> Columns: rowid, term, contrast, estimate, std.error, statistic, p.value, conf.low, conf.high, predicted, predicted_hi, predicted_lo, vs, hp, cyl
Contrasts by group
Here is a slightly more complicated example with contrasts estimated
by subgroup in a lme4
mixed effects model. First we
estimate a model and compute pairwise contrasts by subgroup using
emmeans
:
library(dplyr)
library(lme4)
library(emmeans)
dat <- read.csv("https://vincentarelbundock.github.io/Rdatasets/csv/lme4/VerbAgg.csv")
dat$woman <- as.numeric(dat$Gender == "F")
mod <- glmer(
woman ~ btype * resp + situ + (1 + Anger | item),
family = binomial,
data = dat)
emmeans(mod, specs = "btype", by = "resp") |>
contrast(method = "revpairwise", adjust = "none")
#> resp = no:
#> contrast estimate SE df z.ratio p.value
#> scold - curse -0.0152 0.1097 Inf -0.139 0.8898
#> shout - curse -0.2533 0.1022 Inf -2.479 0.0132
#> shout - scold -0.2381 0.0886 Inf -2.686 0.0072
#>
#> resp = perhaps:
#> contrast estimate SE df z.ratio p.value
#> scold - curse -0.2393 0.1178 Inf -2.031 0.0422
#> shout - curse -0.0834 0.1330 Inf -0.627 0.5309
#> shout - scold 0.1559 0.1358 Inf 1.148 0.2510
#>
#> resp = yes:
#> contrast estimate SE df z.ratio p.value
#> scold - curse 0.0391 0.1292 Inf 0.302 0.7624
#> shout - curse 0.5802 0.1784 Inf 3.252 0.0011
#> shout - scold 0.5411 0.1888 Inf 2.866 0.0042
#>
#> Results are averaged over the levels of: situ
#> Results are given on the log odds ratio (not the response) scale.
What did emmeans
do to obtain these results? Roughly
speaking:
- Create a prediction grid with one cell for each combination of categorical predictors in the model, and all numeric variables held at their means.
- Make adjusted predictions in each cell of the prediction grid.
- Take the average of those predictions (marginal means) for each
combination of
btype
(focal variable) andresp
(groupby
variable). - Compute pairwise differences (contrasts) in marginal means across
different levels of the focal variable
btype
.
In short, emmeans
computes pairwise contrasts between
marginal means, which are themselves averages of adjusted
predictions. This is different from the default types of contrasts
produced by comparisons()
, which reports contrasts between
adjusted predictions, without averaging across a pre-specified
grid of predictors. What does comparisons()
do instead?
Let newdata
be a data frame supplied by the user (or the
original data frame used to fit the model), then:
- Create a new data frame called
newdata2
, which is identical tonewdata
except that the focal variable is incremented by one level. - Compute contrasts as the difference between adjusted predictions
made on the two datasets:
predict(model, newdata = newdata2) - predict(model, newdata = newdata)
Although it is not idiomatic, we can use still use
comparisons()
to emulate the emmeans
results.
First, we create a prediction grid with one cell for each combination of
categorical predictor in the model:
nd <- datagrid(
model = mod,
resp = dat$resp,
situ = dat$situ,
btype = dat$btype)
nrow(nd)
#> [1] 18
This grid has 18 rows, one for each combination of levels for the
resp
(3), situ
(2), and btype
(3)
variables (3 * 2 * 3 = 18).
Then we compute pairwise contrasts over this grid:
cmp <- comparisons(mod,
variables = list("btype" = "pairwise"),
newdata = nd,
type = "link")
nrow(cmp)
#> [1] 54
There are 3 pairwise contrasts, corresponding to the 3 pairwise
comparisons possible between the 3 levels of the focal variable
btype
: scold-curse
, shout-scold
,
shout-curse
. The comparisons()
function
estimates those 3 contrasts for each row of newdata
, so we
get \(18 \times 3 = 54\) rows.
Finally, if we wanted contrasts averaged over each subgroup of the
resp
variable, we can use the
avg_comparisons()
function with the by
argument:
avg_comparisons(mod,
by = "resp",
variables = list("btype" = "pairwise"),
newdata = nd,
type = "link")
#>
#> Term Contrast resp Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
#> btype mean(scold) - mean(curse) no -0.0152 0.1097 -0.139 0.88976 -0.230 0.19972
#> btype mean(scold) - mean(curse) perhaps -0.2393 0.1178 -2.031 0.04221 -0.470 -0.00842
#> btype mean(scold) - mean(curse) yes 0.0391 0.1292 0.302 0.76239 -0.214 0.29234
#> btype mean(shout) - mean(curse) no -0.2533 0.1022 -2.479 0.01319 -0.454 -0.05300
#> btype mean(shout) - mean(curse) perhaps -0.0834 0.1330 -0.627 0.53090 -0.344 0.17737
#> btype mean(shout) - mean(curse) yes 0.5802 0.1784 3.252 0.00115 0.230 0.92987
#> btype mean(shout) - mean(scold) no -0.2381 0.0886 -2.686 0.00723 -0.412 -0.06436
#> btype mean(shout) - mean(scold) perhaps 0.1559 0.1358 1.148 0.25103 -0.110 0.42215
#> btype mean(shout) - mean(scold) yes 0.5411 0.1888 2.866 0.00416 0.171 0.91116
#>
#> Columns: term, contrast, resp, estimate, std.error, statistic, p.value, conf.low, conf.high, predicted, predicted_hi, predicted_lo
These results are identical to those produced by emmeans
(except for \(t\) vs. \(z\)).
Marginal Effects
As far as I can tell, emmeans::emtrends
makes it easier
to compute marginal effects for a few user-specified values than for
large grids or for the full original dataset.
Response scale, user-specified values:
mod <- glm(vs ~ hp + factor(cyl), data = mtcars, family = binomial)
emtrends(mod, ~hp, "hp", regrid = "response", at = list(cyl = 4))
#> hp hp.trend SE df asymp.LCL asymp.UCL
#> 147 -0.00786 0.011 Inf -0.0294 0.0137
#>
#> Confidence level used: 0.95
slopes(mod, newdata = datagrid(cyl = 4))
#>
#> Term Contrast Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 % hp cyl
#> hp dY/dX -0.00785 0.011 -0.713 0.476 -0.0294 0.0137 147 4
#> cyl 6 - 4 -0.22219 0.394 -0.564 0.573 -0.9940 0.5496 147 4
#> cyl 8 - 4 -0.59469 0.511 -1.163 0.245 -1.5965 0.4071 147 4
#>
#> Columns: rowid, term, contrast, estimate, std.error, statistic, p.value, conf.low, conf.high, predicted, predicted_hi, predicted_lo, vs, hp, cyl
Link scale, user-specified values:
emtrends(mod, ~hp, "hp", at = list(cyl = 4))
#> hp hp.trend SE df asymp.LCL asymp.UCL
#> 147 -0.0326 0.0339 Inf -0.099 0.0338
#>
#> Confidence level used: 0.95
slopes(mod, type = "link", newdata = datagrid(cyl = 4))
#>
#> Term Contrast Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 % hp cyl
#> hp dY/dX -0.0326 3.39e-02 -0.96144 0.336 -0.099 3.38e-02 147 4
#> cyl 6 - 4 -0.9049 1.63e+00 -0.55506 0.579 -4.100 2.29e+00 147 4
#> cyl 8 - 4 -19.5418 4.37e+03 -0.00447 0.996 -8579.030 8.54e+03 147 4
#>
#> Columns: rowid, term, contrast, estimate, std.error, statistic, p.value, conf.low, conf.high, predicted, predicted_hi, predicted_lo, vs, hp, cyl
More examples
Here are a few more emmeans
vs. marginaleffects
comparisons:
# Example of examining a continuous x categorical interaction using emmeans and marginaleffects
# Authors: Cameron Patrick and Vincent Arel-Bundock
library(tidyverse)
library(emmeans)
library(marginaleffects)
# use the mtcars data, set up am as a factor
data(mtcars)
<- mtcars |> mutate(am = factor(am))
mc
# fit a linear model to mpg with wt x am interaction
<- lm(mpg ~ wt*am, data = mc)
m summary(m)
# 1. means for each level of am at mean wt.
emmeans(m, "am")
marginal_means(m, variables = "am")
predictions(m, newdata = datagrid(am = 0:1))
# 2. means for each level of am at wt = 2.5, 3, 3.5.
emmeans(m, c("am", "wt"), at = list(wt = c(2.5, 3, 3.5)))
predictions(m, newdata = datagrid(am = 0:1, wt = c(2.5, 3, 3.5))
# 3. means for wt = 2.5, 3, 3.5, averaged over levels of am (implicitly!).
emmeans(m, "wt", at = list(wt = c(2.5, 3, 3.5)))
# same thing, but the averaging is more explicit, using the `by` argument
predictions(
m,newdata = datagrid(am = 0:1, wt = c(2.5, 3, 3.5)),
by = "wt")
# 4. graphical version of 2.
emmip(m, am ~ wt, at = list(wt = c(2.5, 3, 3.5)), CIs = TRUE)
plot_predictions(m, condition = c("wt", "am"))
# 5. compare levels of am at specific values of wt.
# this is a bit ugly because the emmeans defaults for pairs() are silly.
# infer = TRUE: enable confidence intervals.
# adjust = "none": begone, Tukey.
# reverse = TRUE: contrasts as (later level) - (earlier level)
pairs(emmeans(m, "am", by = "wt", at = list(wt = c(2.5, 3, 3.5))),
infer = TRUE, adjust = "none", reverse = TRUE)
comparisons(
m,variables = "am",
newdata = datagrid(wt = c(2.5, 3, 3.5)))
# 6. plot of pairswise comparisons
plot(pairs(emmeans(m, "am", by = "wt", at = list(wt = c(2.5, 3, 3.5))),
infer = TRUE, adjust = "none", reverse = TRUE))
# Since `wt` is numeric, the default is to plot it as a continuous variable on
# the x-axis. But not that this is the **exact same info** as in the emmeans plot.
plot_comparisons(m, variables = "am", condition = "wt")
# You of course customize everything, set draw=FALSE, and feed the raw data to feed to ggplot2
<- plot_comparisons(
p
m,variables = "am",
condition = list(wt = c(2.5, 3, 3.5)),
draw = FALSE)
ggplot(p, aes(y = wt, x = comparison, xmin = conf.low, xmax = conf.high)) +
geom_pointrange()
# 7. slope of wt for each level of am
emtrends(m, "am", "wt")
slopes(m, newdata = datagrid(am = 0:1))
margins
and prediction
The margins
and prediction
packages for R
were designed by Thomas Leeper to emulate
the behavior of the margins
command from
Stata
. These packages are trailblazers and strongly
influenced the development of marginaleffects
. The main
benefits of marginaleffects
over these packages are:
- Support more model types
- Faster
- Memory efficient
- Plots using
ggplot2
instead of Base R - More extensive test suite
- Active development
The syntax of the two packages is very similar.
Average Marginal Effects
library(margins)
library(marginaleffects)
mod <- lm(mpg ~ cyl + hp + wt, data = mtcars)
mar <- margins(mod)
summary(mar)
#> factor AME SE z p lower upper
#> cyl -0.9416 0.5509 -1.7092 0.0874 -2.0214 0.1382
#> hp -0.0180 0.0119 -1.5188 0.1288 -0.0413 0.0052
#> wt -3.1670 0.7406 -4.2764 0.0000 -4.6185 -1.7155
mfx <- slopes(mod)
Individual-Level Marginal Effects
Marginal effects in a user-specified data frame:
head(data.frame(mar))
#> mpg cyl disp hp drat wt qsec vs am gear carb fitted se.fitted dydx_cyl dydx_hp dydx_wt Var_dydx_cyl Var_dydx_hp Var_dydx_wt X_weights X_at_number
#> 1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 22.82043 0.6876212 -0.9416168 -0.0180381 -3.166973 0.3035123 0.0001410453 0.5484517 NA 1
#> 2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 22.01285 0.6056817 -0.9416168 -0.0180381 -3.166973 0.3035123 0.0001410453 0.5484517 NA 1
#> 3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 25.96040 0.7349593 -0.9416168 -0.0180381 -3.166973 0.3035123 0.0001410453 0.5484517 NA 1
#> 4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 20.93608 0.5800910 -0.9416168 -0.0180381 -3.166973 0.3035123 0.0001410453 0.5484517 NA 1
#> 5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 17.16780 0.8322986 -0.9416168 -0.0180381 -3.166973 0.3035123 0.0001410453 0.5484517 NA 1
#> 6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 20.25036 0.6638322 -0.9416168 -0.0180381 -3.166973 0.3035123 0.0001410453 0.5484517 NA 1
head(mfx)
#>
#> Term Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
#> cyl -0.942 0.551 -1.71 0.0874 -2.02 0.138
#> cyl -0.942 0.551 -1.71 0.0874 -2.02 0.138
#> cyl -0.942 0.551 -1.71 0.0874 -2.02 0.138
#> cyl -0.942 0.551 -1.71 0.0874 -2.02 0.138
#> cyl -0.942 0.551 -1.71 0.0874 -2.02 0.138
#> cyl -0.942 0.551 -1.71 0.0874 -2.02 0.138
#>
#> Columns: rowid, term, estimate, std.error, statistic, p.value, conf.low, conf.high, predicted, predicted_hi, predicted_lo, mpg, cyl, hp, wt
nd <- data.frame(cyl = 4, hp = 110, wt = 3)
Marginal Effects at the Mean
mar <- margins(mod, data = data.frame(prediction::mean_or_mode(mtcars)), unit_ses = TRUE)
data.frame(mar)
#> mpg cyl disp hp drat wt qsec vs am gear carb fitted se.fitted dydx_cyl dydx_hp dydx_wt Var_dydx_cyl Var_dydx_hp Var_dydx_wt SE_dydx_cyl SE_dydx_hp SE_dydx_wt X_weights X_at_number
#> 1 20.09062 6.1875 230.7219 146.6875 3.596563 3.21725 17.84875 0.4375 0.40625 3.6875 2.8125 20.09062 0.4439832 -0.9416168 -0.0180381 -3.166973 0.3035082 0.0001410452 0.5484409 0.5509157 0.01187625 0.7405679 NA 1
slopes(mod, newdata = "mean")
#>
#> Term Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
#> cyl -0.942 0.5509 -1.71 0.0874 -2.0214 0.13815
#> hp -0.018 0.0119 -1.52 0.1288 -0.0413 0.00524
#> wt -3.167 0.7406 -4.28 <0.001 -4.6185 -1.71546
#>
#> Columns: rowid, term, estimate, std.error, statistic, p.value, conf.low, conf.high, predicted, predicted_hi, predicted_lo, mpg, cyl, hp, wt
Counterfactual Average Marginal Effects
The at
argument of the margins
package
emulates Stata
by fixing the values of some variables at
user-specified values, and by replicating the full dataset several times
for each combination of the supplied values (see the Stata
section below). For example, if the dataset includes 32 rows and the
user calls at=list(cyl=c(4, 6))
, margins
will
compute 64 unit-level marginal effects estimates:
dat <- mtcars
dat$cyl <- factor(dat$cyl)
mod <- lm(mpg ~ cyl * hp + wt, data = mtcars)
mar <- margins(mod, at = list(cyl = c(4, 6, 8)))
summary(mar)
#> factor cyl AME SE z p lower upper
#> cyl 4.0000 0.0381 0.5998 0.0636 0.9493 -1.1374 1.2137
#> cyl 6.0000 0.0381 0.5999 0.0636 0.9493 -1.1376 1.2139
#> cyl 8.0000 0.0381 0.5999 0.0636 0.9493 -1.1376 1.2139
#> hp 4.0000 -0.0878 0.0267 -3.2937 0.0010 -0.1400 -0.0355
#> hp 6.0000 -0.0499 0.0154 -3.2397 0.0012 -0.0800 -0.0197
#> hp 8.0000 -0.0120 0.0108 -1.1065 0.2685 -0.0332 0.0092
#> wt 4.0000 -3.1198 0.6613 -4.7175 0.0000 -4.4160 -1.8236
#> wt 6.0000 -3.1198 0.6613 -4.7176 0.0000 -4.4160 -1.8236
#> wt 8.0000 -3.1198 0.6613 -4.7175 0.0000 -4.4160 -1.8236
avg_slopes(
mod,
by = "cyl",
newdata = datagridcf(cyl = c(4, 6, 8)))
#>
#> Term Contrast cyl Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
#> cyl mean(dY/dX) 4 0.0381 0.5999 0.0636 0.9493 -1.1376 1.21390
#> cyl mean(dY/dX) 6 0.0381 0.5999 0.0636 0.9493 -1.1376 1.21390
#> cyl mean(dY/dX) 8 0.0381 0.5999 0.0636 0.9493 -1.1376 1.21390
#> hp mean(dY/dX) 4 -0.0878 0.0267 -3.2937 <0.001 -0.1400 -0.03554
#> hp mean(dY/dX) 6 -0.0499 0.0154 -3.2397 0.0012 -0.0800 -0.01970
#> hp mean(dY/dX) 8 -0.0120 0.0108 -1.1065 0.2685 -0.0332 0.00923
#> wt mean(dY/dX) 4 -3.1198 0.6613 -4.7175 <0.001 -4.4160 -1.82365
#> wt mean(dY/dX) 6 -3.1198 0.6613 -4.7175 <0.001 -4.4160 -1.82365
#> wt mean(dY/dX) 8 -3.1198 0.6613 -4.7175 <0.001 -4.4160 -1.82365
#>
#> Columns: term, contrast, cyl, estimate, std.error, statistic, p.value, conf.low, conf.high, predicted, predicted_hi, predicted_lo
Adjusted Predictions
The syntax to compute adjusted predictions using the
predictions
package or marginaleffects
is very
similar:
prediction::prediction(mod) |> head()
#> mpg cyl disp hp drat wt qsec vs am gear carb fitted se.fitted
#> 1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 21.90488 0.6927034
#> 2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 21.10933 0.6266557
#> 3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 25.64753 0.6652076
#> 4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 20.04859 0.6041400
#> 5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 17.25445 0.7436172
#> 6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 19.53360 0.6436862
marginaleffects::predictions(mod) |> head()
#>
#> Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
#> 21.9 0.693 31.6 <0.001 20.5 23.3
#> 21.1 0.627 33.7 <0.001 19.9 22.3
#> 25.6 0.665 38.6 <0.001 24.3 27.0
#> 20.0 0.604 33.2 <0.001 18.9 21.2
#> 17.3 0.744 23.2 <0.001 15.8 18.7
#> 19.5 0.644 30.3 <0.001 18.3 20.8
#>
#> Columns: rowid, estimate, std.error, statistic, p.value, conf.low, conf.high, mpg, cyl, hp, wt
Stata
Stata
is a good but expensive software package for
statistical analysis. It is published by StataCorp LLC. This section
compares Stata
’s margins
command to
marginaleffects
.
The results produced by marginaleffects
are extensively
tested against Stata
. See the test
suite for a list of the dozens of models where we compared estimates
and standard errors.
Average Marginal Effect (AMEs)
Marginal effects are unit-level quantities. To compute “average marginal effects”, we first calculate marginal effects for each observation in a dataset. Then, we take the mean of those unit-level marginal effects.
Stata
Both Stata’s margins
command and the slopes
function can calculate average marginal effects (AMEs). Here is an
example showing how to estimate AMEs in Stata:
quietly reg mpg cyl hp wt
margins, dydx(*)
Average marginal effects Number of obs = 32
Model VCE : OLS
Expression : Linear prediction, predict()
dy/dx w.r.t. : cyl hp wt
------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. t P>|t| [95% Conf. Interval]
------------------------------------------------------------------------------
cyl | -.9416168 .5509164 -1.71 0.098 -2.070118 .1868842
hp | -.0180381 .0118762 -1.52 0.140 -.0423655 .0062893
wt | -3.166973 .7405759 -4.28 0.000 -4.683974 -1.649972
------------------------------------------------------------------------------
marginaleffects
The same results can be obtained with slopes()
and
summary()
like this:
library("marginaleffects")
mod <- lm(mpg ~ cyl + hp + wt, data = mtcars)
avg_slopes(mod)
#>
#> Term Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
#> cyl -0.942 0.5509 -1.71 0.0874 -2.0214 0.13816
#> hp -0.018 0.0119 -1.52 0.1288 -0.0413 0.00524
#> wt -3.167 0.7406 -4.28 <0.001 -4.6185 -1.71547
#>
#> Columns: term, estimate, std.error, statistic, p.value, conf.low, conf.high
Note that Stata
reports t statistics while
marginaleffects
reports Z. This produces slightly different
p-values because this model has low degrees of freedom:
mtcars
only has 32 rows
Counterfactual Marginal Effects
A “counterfactual marginal effect” is a special quantity obtained by replicating a dataset while fixing some regressor to user-defined values.
Concretely, Stata computes counterfactual marginal effects in 3 steps:
- Duplicate the whole dataset 3 times and sets the values of
cyl
to the three specified values in each of those subsets. - Calculate marginal effects for each observation in that large grid.
- Take the average of marginal effects for each value of the variable of interest.
Stata
With the at
argument, Stata’s margins
command estimates average counterfactual marginal effects. Here
is an example:
quietly reg mpg i.cyl##c.hp wt
margins, dydx(hp) at(cyl = (4 6 8))
Average marginal effects Number of obs = 32
Model VCE : OLS
Expression : Linear prediction, predict()
dy/dx w.r.t. : hp
1._at : cyl = 4
2._at : cyl = 6
3._at : cyl = 8
------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
hp |
_at |
1 | -.099466 .0348665 -2.85 0.009 -.1712749 -.0276571
2 | -.0213768 .038822 -0.55 0.587 -.1013323 .0585787
3 | -.013441 .0125138 -1.07 0.293 -.0392137 .0123317
------------------------------------------------------------------------------
marginaleffects
You can estimate average counterfactual marginal effects with
slopes()
by using the datagridcf()
to create a
counterfactual dataset in which the full original dataset is replicated
for each potential value of the cyl
variable. Then, we tell
the by
argument to average within groups:
mod <- lm(mpg ~ as.factor(cyl) * hp + wt, data = mtcars)
avg_slopes(
mod,
variables = "hp",
by = "cyl",
newdata = datagridcf(cyl = c(4, 6, 8)))
#>
#> Term Contrast cyl Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
#> hp mean(dY/dX) 4 -0.0995 0.0349 -2.853 0.00433 -0.1678 -0.0311
#> hp mean(dY/dX) 6 -0.0214 0.0388 -0.551 0.58188 -0.0975 0.0547
#> hp mean(dY/dX) 8 -0.0134 0.0125 -1.074 0.28278 -0.0380 0.0111
#>
#> Columns: term, contrast, cyl, estimate, std.error, statistic, p.value, conf.low, conf.high, predicted, predicted_hi, predicted_lo
This is equivalent to taking the group-wise mean of observation-level
marginal effects (without the by
argument):
mfx <- slopes(
mod,
variables = "hp",
newdata = datagridcf(cyl = c(4, 6, 8)))
aggregate(estimate ~ term + cyl, data = mfx, FUN = mean)
#> term cyl estimate
#> 1 hp 4 -0.09946598
#> 2 hp 6 -0.02137679
#> 3 hp 8 -0.01344103
Note that following Stata
, the standard errors for
group-averaged marginal effects are computed by taking the “Jacobian at
the mean:”
Average Counterfactual Adjusted Predictions
Stata
Just like Stata’s margins
command computes average
counterfactual marginal effects, it can also estimate average
counterfactual adjusted predictions.
Here is an example:
quietly reg mpg i.cyl##c.hp wt
margins, at(cyl = (4 6 8))
Predictive margins Number of obs = 32
Model VCE : OLS
Expression : Linear prediction, predict()
1._at : cyl = 4
2._at : cyl = 6
3._at : cyl = 8
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_at |
1 | 17.44233 2.372914 7.35 0.000 12.55522 22.32944
2 | 18.9149 1.291483 14.65 0.000 16.25505 21.57476
3 | 18.33318 1.123874 16.31 0.000 16.01852 20.64785
------------------------------------------------------------------------------
Again, this is what Stata does in the background:
- It duplicates the whole dataset 3 times and sets the values of
cyl
to the three specified values in each of those subsets. - It calculates predictions for that large grid.
- It takes the average prediction for each value of
cyl
.
In other words, average counterfactual adjusted predictions as
implemented by Stata are a hybrid between predictions at the observed
values (the default in marginaleffects::predictions
) and
predictions at representative values.
marginaleffects
You can estimate average counterfactual adjusted predictions with
predictions()
by, first, setting the grid_type
argument of datagrid()
to "counterfactual"
and, second, by averaging the predictions using the by
argument of summary()
, or a manual function like
dplyr::summarise()
.
mod <- lm(mpg ~ as.factor(cyl) * hp + wt, data = mtcars)
predictions(
mod,
by = "cyl",
newdata = datagridcf(cyl = c(4, 6, 8)))
#>
#> cyl Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
#> 4 17.4 2.37 7.35 <0.001 12.8 22.1
#> 6 18.9 1.29 14.65 <0.001 16.4 21.4
#> 8 18.3 1.12 16.31 <0.001 16.1 20.5
#>
#> Columns: cyl, estimate, std.error, statistic, p.value, conf.low, conf.high
predictions(
mod,
newdata = datagridcf(cyl = c(4, 6, 8))) |>
group_by(cyl) |>
summarize(AAP = mean(estimate))
#> # A tibble: 3 × 2
#> cyl AAP
#> <fct> <dbl>
#> 1 4 17.4
#> 2 6 18.9
#> 3 8 18.3
brmsmargins
The
brmsmargins
package is developed by Joshua Wiley:
This package has functions to calculate marginal effects from brms models ( http://paul-buerkner.github.io/brms/ ). A central motivator is to calculate average marginal effects (AMEs) for continuous and discrete predictors in fixed effects only and mixed effects regression models including location scale models.
The main advantage of brmsmargins
over
marginaleffects
is its ability to compute “Marginal
Coefficients” following the method described in Hedeker et al (2012).
The main advantages of marginaleffects
over
brmsmargins
are:
- Support for 60+ model types, rather than just the
brms
package. - Simpler user interface (subjective).
- At the time of writing (2022-05-25)
brmsmargins
did not support certainbrms
models such as those with multivariate or multinomial outcomes. It also did not support custom outcome transformations.
The rest of this section presents side-by-side replications of some
of the analyses from the brmsmargins
vignettes in order to
show highlight parallels and differences in syntax.
Marginal Effects for Fixed Effects Models
AMEs for Logistic Regression
Estimate a logistic regression model with brms
:
library(brms)
library(brmsmargins)
library(marginaleffects)
library(data.table)
library(withr)
h <- 1e-4
void <- capture.output(
bayes.logistic <- brm(
vs ~ am + mpg, data = mtcars,
family = "bernoulli", seed = 1234,
silent = 2, refresh = 0,
chains = 4L, cores = 4L)
)
Compute AMEs manually:
d1 <- d2 <- mtcars
d2$mpg <- d2$mpg + h
p1 <- posterior_epred(bayes.logistic, newdata = d1)
p2 <- posterior_epred(bayes.logistic, newdata = d2)
m <- (p2 - p1) / h
quantile(rowMeans(m), c(.5, .025, .975))
#> 50% 2.5% 97.5%
#> 0.07001767 0.05469002 0.09241120
Compute AMEs with brmsmargins
:
bm <- brmsmargins(
bayes.logistic,
add = data.frame(mpg = c(0, 0 + h)),
contrasts = cbind("AME MPG" = c(-1 / h, 1 / h)),
CI = 0.95,
CIType = "ETI")
data.frame(bm$ContrastSummary)
#> M Mdn LL UL PercentROPE PercentMID CI CIType ROPE MID Label
#> 1 0.07110289 0.07001767 0.05469002 0.0924112 NA NA 0.95 ETI <NA> <NA> AME MPG
Compute AMEs using marginaleffects
:
avg_slopes(bayes.logistic)
#>
#> Term Contrast Estimate 2.5 % 97.5 %
#> am 1 - 0 -0.267 -0.4188 -0.0763
#> mpg dY/dX 0.070 0.0547 0.0924
#>
#> Columns: term, contrast, estimate, conf.low, conf.high
The mpg
element of the Effect
column from
marginaleffects
matches the the M
column of
the output from brmsmargins
.
Marginal Effects for Mixed Effects Models
Estimate a mixed effects logistic regression model with
brms
:
d <- withr::with_seed(
seed = 12345, code = {
nGroups <- 100
nObs <- 20
theta.location <- matrix(rnorm(nGroups * 2), nrow = nGroups, ncol = 2)
theta.location[, 1] <- theta.location[, 1] - mean(theta.location[, 1])
theta.location[, 2] <- theta.location[, 2] - mean(theta.location[, 2])
theta.location[, 1] <- theta.location[, 1] / sd(theta.location[, 1])
theta.location[, 2] <- theta.location[, 2] / sd(theta.location[, 2])
theta.location <- theta.location %*% chol(matrix(c(1.5, -.25, -.25, .5^2), 2))
theta.location[, 1] <- theta.location[, 1] - 2.5
theta.location[, 2] <- theta.location[, 2] + 1
d <- data.table(
x = rep(rep(0:1, each = nObs / 2), times = nGroups))
d[, ID := rep(seq_len(nGroups), each = nObs)]
for (i in seq_len(nGroups)) {
d[ID == i, y := rbinom(
n = nObs,
size = 1,
prob = plogis(theta.location[i, 1] + theta.location[i, 2] * x))
]
}
copy(d)
})
void <- capture.output(
mlogit <- brms::brm(
y ~ 1 + x + (1 + x | ID), family = "bernoulli",
data = d, seed = 1234,
silent = 2, refresh = 0,
chains = 4L, cores = 4L)
)
#> Warning: There were 9 divergent transitions after warmup. See
#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#> to find out why this is a problem and how to eliminate them.
#> Warning: Examine the pairs() plot to diagnose sampling problems
#> Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#bulk-ess
AME: Including Random Effects
bm <- brmsmargins(
mlogit,
add = data.frame(x = c(0, h)),
contrasts = cbind("AME x" = c(-1 / h, 1 / h)),
effects = "includeRE",
CI = .95,
CIType = "ETI")
data.frame(bm$ContrastSummary)
#> M Mdn LL UL PercentROPE PercentMID CI CIType ROPE MID Label
#> 1 0.1118585 0.1117003 0.08167159 0.1434978 NA NA 0.95 ETI <NA> <NA> AME x
avg_slopes(mlogit)
#>
#> Term Contrast Estimate 2.5 % 97.5 %
#> x 1 - 0 0.111 0.0818 0.142
#>
#> Columns: term, contrast, estimate, conf.low, conf.high
AME: Fixed Effects Only (Grand Mean)
bm <- brmsmargins(
mlogit,
add = data.frame(x = c(0, h)),
contrasts = cbind("AME x" = c(-1 / h, 1 / h)),
effects = "fixedonly",
CI = .95,
CIType = "ETI")
data.frame(bm$ContrastSummary)
#> M Mdn LL UL PercentROPE PercentMID CI CIType ROPE MID Label
#> 1 0.1035957 0.1027827 0.06455371 0.1478527 NA NA 0.95 ETI <NA> <NA> AME x
avg_slopes(mlogit, re_formula = NA)
#>
#> Term Contrast Estimate 2.5 % 97.5 %
#> x 1 - 0 0.1 0.0636 0.142
#>
#> Columns: term, contrast, estimate, conf.low, conf.high
Marginal Effects for Location Scale Models
AMEs for Fixed Effects Location Scale Models
Estimate a fixed effects location scale model with
brms
:
d <- withr::with_seed(
seed = 12345, code = {
nObs <- 1000L
d <- data.table(
grp = rep(0:1, each = nObs / 2L),
x = rnorm(nObs, mean = 0, sd = 0.25))
d[, y := rnorm(nObs,
mean = x + grp,
sd = exp(1 + x + grp))]
copy(d)
})
void <- capture.output(
ls.fe <- brm(bf(
y ~ 1 + x + grp,
sigma ~ 1 + x + grp),
family = "gaussian",
data = d, seed = 1234,
silent = 2, refresh = 0,
chains = 4L, cores = 4L)
)
Fixed effects only
bm <- brmsmargins(
ls.fe,
add = data.frame(x = c(0, h)),
contrasts = cbind("AME x" = c(-1 / h, 1 / h)),
CI = 0.95, CIType = "ETI",
effects = "fixedonly")
data.frame(bm$ContrastSummary)
#> M Mdn LL UL PercentROPE PercentMID CI CIType ROPE MID Label
#> 1 1.616692 1.616495 0.7227972 2.466072 NA NA 0.95 ETI <NA> <NA> AME x
avg_slopes(ls.fe, re_formula = NA)
#>
#> Term Contrast Estimate 2.5 % 97.5 %
#> grp 1 - 0 1.02 0.342 1.69
#> x dY/dX 1.62 0.723 2.47
#>
#> Columns: term, contrast, estimate, conf.low, conf.high
Discrete change and distributional parameter
(dpar
)
Compute the contrast between adjusted predictions on the
sigma
parameter, when grp=0
and
grp=1
:
bm <- brmsmargins(
ls.fe,
at = data.frame(grp = c(0, 1)),
contrasts = cbind("AME grp" = c(-1, 1)),
CI = 0.95, CIType = "ETI", dpar = "sigma",
effects = "fixedonly")
data.frame(bm$ContrastSummary)
#> M Mdn LL UL PercentROPE PercentMID CI CIType ROPE MID Label
#> 1 4.897214 4.888706 4.412579 5.408414 NA NA 0.95 ETI <NA> <NA> AME grp
In marginaleffects
we use the comparisons()
function and the variables
argument:
avg_comparisons(
ls.fe,
variables = list(grp = 0:1),
dpar = "sigma")
#>
#> Term Contrast Estimate 2.5 % 97.5 %
#> grp 1 - 0 4.89 4.41 5.41
#>
#> Columns: term, contrast, estimate, conf.low, conf.high
Marginal effect (continuous) on sigma
bm <- brmsmargins(
ls.fe,
add = data.frame(x = c(0, h)),
contrasts = cbind("AME x" = c(-1 / h, 1 / h)),
CI = 0.95, CIType = "ETI", dpar = "sigma",
effects = "fixedonly")
data.frame(bm$ContrastSummary)
#> M Mdn LL UL PercentROPE PercentMID CI CIType ROPE MID Label
#> 1 4.459865 4.452953 3.52165 5.463088 NA NA 0.95 ETI <NA> <NA> AME x
avg_slopes(ls.fe, dpar = "sigma", re_formula = NA)
#>
#> Term Contrast Estimate 2.5 % 97.5 %
#> grp 1 - 0 4.89 4.41 5.41
#> x dY/dX 4.45 3.52 5.46
#>
#> Columns: term, contrast, estimate, conf.low, conf.high
effects
The effects
package was created by John Fox and colleagues.
-
marginaleffects
supports 30+ more model types thaneffects
. -
effects
focuses on the computation of “adjusted predictions.” The plots it produces are roughly equivalent to the ones produced by theplot_predictions
andpredictions
functions inmarginaleffects
. -
effects
does not appear support marginal effects (slopes), marginal means, or contrasts -
effects
uses Base graphics whereasmarginaleffects
usesggplot2
-
effects
includes a lot of very powerful options to customize plots. In contrast,marginaleffects
produces objects which can be customized by chainingggplot2
functions. Users can also callplot_predictions(model, draw=FALSE)
to create a prediction grid, and then work the raw data directly to create the plot they need
effects
offers several options which are not currently
available in marginaleffects
, including:
- Partial residuals plots
- Many types of ways to plot adjusted predictions: package vignette
modelbased
The modelbased
package is developed by the easystats
team.
This section is incomplete; contributions are welcome.
- Wrapper around
emmeans
to compute marginal means and marginal effects. - Powerful functions to create beautiful plots.
ggeffects
The ggeffects
package is developed by Daniel Lüdecke.
This section is incomplete; contributions are welcome.
- Wrapper around
emmeans
to compute marginal means.