What to do when marginaleffects is slow?

Some options:

  1. Compute marginal effects and contrasts at the mean (or other representative value) instead of all observed rows of the original dataset: Use the newdata argument and the datagrid() function.
  2. Compute marginal effects for a subset of variables, paying special attention to exclude factor variables which can be particularly costly to process: Use the variables argument.
  3. Do not compute standard errors: Use the vcov = FALSE argument.

This simulation illustrates how computation time varies for a model with 25 regressors and 100,000 observations:

library(marginaleffects)

# simulate data and fit a large model
N <- 1e5
dat <- data.frame(matrix(rnorm(N * 26), ncol = 26))
mod <- lm(X1 ~ ., dat)

results <- bench::mark(
    # marginal effects at the mean; no standard error
    marginaleffects(mod, vcov = FALSE, newdata = "mean"),
    # marginal effects at the mean
    marginaleffects(mod, newdata = datagrid()),
    # 1 variable; no standard error
    marginaleffects(mod, vcov = FALSE, variables = "X3"),
    # 1 variable
    marginaleffects(mod, variables = "X3"),
    # 26 variables; no standard error
    marginaleffects(mod, vcov = FALSE),
    # 26 variables
    marginaleffects(mod),
    iterations = 1, check = FALSE)

results[, c(1, 3, 5)]
#   <bch:expr>                                               <bch:tm> <bch:byt>
# 1 marginaleffects(mod, vcov = FALSE, newdata = "mean") 141.04ms  233.94MB
# 2 marginaleffects(mod, newdata = "mean")               276.61ms  236.18MB
# 3 marginaleffects(mod, vcov = FALSE, variables = "X3")     193.81ms  385.33MB
# 4 marginaleffects(mod, variables = "X3")                      2.85s    3.14GB
# 5 marginaleffects(mod, vcov = FALSE)                          4.32s    7.62GB
# 6 marginaleffects(mod)                                        1.15m   76.55GB

The benchmarks above were conducted using the development version of marginaleffects on 2022-04-15.

Speed comparison

The marginaleffects functions are relatively fast. This simulation was conducted using the development version of the package on 2022-04-04:

library(margins)

N <- 1e3
dat <- data.frame(
    y = sample(0:1, N, replace = TRUE),
    x1 = rnorm(N),
    x2 = rnorm(N),
    x3 = rnorm(N),
    x4 = factor(sample(letters[1:5], N, replace = TRUE)))
mod <- glm(y ~ x1 + x2 + x3 + x4, data = dat, family = binomial)

marginaleffects is about 6x faster than margins when unit-level standard errors are not computed:

results <- bench::mark(
    marginaleffects(mod, vcov = FALSE),
    margins(mod, unit_ses = FALSE),
    check = FALSE, relative = TRUE)
results[, c(1, 3, 5)]

#   expression                        median mem_alloc
#   <bch:expr>                          <dbl>     <dbl>
# 1 marginaleffects(mod, vcov = FALSE)   1         1
# 2 margins(mod, unit_ses = FALSE)       6.15      4.17

marginaleffects can be nearly 600x times faster than margins when unit-level standard errors are computed:

results <- bench::mark(
    marginaleffects(mod, vcov = TRUE),
    margins(mod, unit_ses = TRUE),
    check = FALSE, relative = TRUE, iterations = 1)
results[, c(1, 3, 5)]

#   expression                        median mem_alloc
# 1 marginaleffects(mod, vcov = TRUE)     1        1
# 2 margins(mod, unit_ses = TRUE)       581.      20.5

Models estimated on larger datasets (> 1000 observations) can be difficult to process using the margins package, because of memory and time constraints. In contrast, marginaleffects can work well on much larger datasets.

In some cases, marginaleffects will be considerably slower than packages like emmeans or modmarg. This is because these packages make extensive use of hard-coded analytical derivatives, or reimplement their own fast prediction functions.