Skip to contents

Generate a data grid of "typical," "counterfactual," or user-specified values for use in the newdata argument of the marginaleffects or predictions functions.

Usage

datagrid(
  ...,
  model = NULL,
  newdata = NULL,
  grid_type = "typical",
  FUN_character = Mode,
  FUN_factor = Mode,
  FUN_logical = Mode,
  FUN_numeric = function(x) mean(x, na.rm = TRUE),
  FUN_other = function(x) mean(x, na.rm = TRUE)
)

Arguments

...

named arguments with vectors of values or functions for user-specified variables.

  • Functions are applied to the variable in the model dataset or newdata, and must return a vector of the appropriate type.

  • Character vectors are automatically transformed to factors if necessary. +The output will include all combinations of these variables (see Examples below.)

model

Model object

newdata

data.frame (one and only one of the model and newdata arguments

grid_type

character

  • "typical": variables whose values are not explicitly specified by the user in ... are set to their mean or mode, or to the output of the functions supplied to FUN_type arguments.

  • "counterfactual": the entire dataset is duplicated for each combination of the variable values specified in .... Variables not explicitly supplied to datagrid() are set to their observed values in the original dataset.

FUN_character

the function to be applied to character variables.

FUN_factor

the function to be applied to factor variables.

FUN_logical

the function to be applied to factor variables.

FUN_numeric

the function to be applied to numeric variables.

FUN_other

the function to be applied to other variable types.

Value

A data.frame in which each row corresponds to one combination of the named predictors supplied by the user via the ... dots. Variables which are not explicitly defined are held at their mean or mode.

Details

If datagrid is used in a marginaleffects or predictions call as the newdata argument, the model is automatically inserted in the function call, and users do not need to specify either the model or newdata arguments. Note that only the variables used to fit the models will be attached to the results. If a user wants to attach other variables as well (e.g., weights or grouping variables), they can supply a data.frame explicitly to the newdata argument inside datagrid().

If users supply a model, the data used to fit that model is retrieved using the insight::get_data function.

See also

Other grid: datagridcf()

Examples

# The output only has 2 rows, and all the variables except `hp` are at their
# mean or mode.
datagrid(newdata = mtcars, hp = c(100, 110))
#>        mpg    cyl     disp     drat      wt     qsec     vs      am   gear
#> 1 20.09062 6.1875 230.7219 3.596563 3.21725 17.84875 0.4375 0.40625 3.6875
#> 2 20.09062 6.1875 230.7219 3.596563 3.21725 17.84875 0.4375 0.40625 3.6875
#>     carb  hp
#> 1 2.8125 100
#> 2 2.8125 110

# We get the same result by feeding a model instead of a data.frame
mod <- lm(mpg ~ hp, mtcars)
datagrid(model = mod, hp = c(100, 110))
#>    hp
#> 1 100
#> 2 110

# Use in `marginaleffects` to compute "Typical Marginal Effects". When used
# in `marginaleffects()` or `predictions()` we do not need to specify the
#`model` or `newdata` arguments.
marginaleffects(mod, newdata = datagrid(hp = c(100, 110)))
#>   rowid     type term        dydx std.error statistic      p.value    conf.low
#> 1     1 response   hp -0.06822828 0.0101193 -6.742389 1.558038e-11 -0.08806175
#> 2     2 response   hp -0.06822828 0.0101193 -6.742389 1.558038e-11 -0.08806175
#>     conf.high predicted predicted_hi predicted_lo  hp mpg    eps
#> 1 -0.04839481  23.27603     23.27410     23.27603 100  21 0.0283
#> 2 -0.04839481  22.59375     22.59182     22.59375 110  21 0.0283

# datagrid accepts functions
datagrid(hp = range, cyl = unique, newdata = mtcars)
#>        mpg     disp     drat      wt     qsec     vs      am   gear   carb  hp
#> 1 20.09062 230.7219 3.596563 3.21725 17.84875 0.4375 0.40625 3.6875 2.8125  52
#> 2 20.09062 230.7219 3.596563 3.21725 17.84875 0.4375 0.40625 3.6875 2.8125  52
#> 3 20.09062 230.7219 3.596563 3.21725 17.84875 0.4375 0.40625 3.6875 2.8125  52
#> 4 20.09062 230.7219 3.596563 3.21725 17.84875 0.4375 0.40625 3.6875 2.8125 335
#> 5 20.09062 230.7219 3.596563 3.21725 17.84875 0.4375 0.40625 3.6875 2.8125 335
#> 6 20.09062 230.7219 3.596563 3.21725 17.84875 0.4375 0.40625 3.6875 2.8125 335
#>   cyl
#> 1   6
#> 2   4
#> 3   8
#> 4   6
#> 5   4
#> 6   8
comparisons(mod, newdata = datagrid(hp = fivenum))
#>   rowid     type term contrast  comparison std.error statistic      p.value
#> 1     1 response   hp       +1 -0.06822828 0.0101193 -6.742389 1.558038e-11
#> 2     2 response   hp       +1 -0.06822828 0.0101193 -6.742389 1.558038e-11
#> 3     3 response   hp       +1 -0.06822828 0.0101193 -6.742389 1.558037e-11
#> 4     4 response   hp       +1 -0.06822828 0.0101193 -6.742389 1.558037e-11
#> 5     5 response   hp       +1 -0.06822828 0.0101193 -6.742389 1.558038e-11
#>      conf.low   conf.high predicted predicted_hi predicted_lo  hp mpg    eps
#> 1 -0.08806175 -0.04839481 26.550990    26.516876    26.585104  52  21 0.0283
#> 2 -0.08806175 -0.04839481 23.548946    23.514832    23.583060  96  21 0.0283
#> 3 -0.08806175 -0.04839481 21.706782    21.672668    21.740896 123  21 0.0283
#> 4 -0.08806175 -0.04839481 17.817770    17.783656    17.851885 180  21 0.0283
#> 5 -0.08806175 -0.04839481  7.242387     7.208273     7.276502 335  21 0.0283

# The full dataset is duplicated with each observation given counterfactual
# values of 100 and 110 for the `hp` variable. The original `mtcars` includes
# 32 rows, so the resulting dataset includes 64 rows.
dg <- datagrid(newdata = mtcars, hp = c(100, 110), grid_type = "counterfactual")
nrow(dg)
#> [1] 64

# We get the same result by feeding a model instead of a data.frame
mod <- lm(mpg ~ hp, mtcars)
dg <- datagrid(model = mod, hp = c(100, 110), grid_type = "counterfactual")
nrow(dg)
#> [1] 64