Generate a data grid of user-specified values for use in the newdata
argument of the predictions()
, comparisons()
, and slopes()
functions. This is useful to define where in the predictor space we want to evaluate the quantities of interest. Ex: the predicted outcome or slope for a 37 year old college graduate.
datagrid()
generates data frames with combinations of "typical" or user-supplied predictor values.datagridcf()
generates "counter-factual" data frames, by replicating the entire dataset once for every combination of predictor values supplied by the user.
Usage
datagrid(
...,
model = NULL,
newdata = NULL,
by = NULL,
FUN_character = get_mode,
FUN_factor = get_mode,
FUN_logical = get_mode,
FUN_numeric = function(x) mean(x, na.rm = TRUE),
FUN_integer = function(x) round(mean(x, na.rm = TRUE)),
FUN_other = function(x) mean(x, na.rm = TRUE),
grid_type = "typical"
)
datagridcf(..., model = NULL, newdata = NULL)
Arguments
- ...
named arguments with vectors of values or functions for user-specified variables.
Functions are applied to the variable in the
model
dataset ornewdata
, and must return a vector of the appropriate type.Character vectors are automatically transformed to factors if necessary. +The output will include all combinations of these variables (see Examples below.)
- model
Model object
- newdata
data.frame (one and only one of the
model
andnewdata
arguments can be used.)- by
character vector with grouping variables within which
FUN_*
functions are applied to create "sub-grids" with unspecified variables.- FUN_character
the function to be applied to character variables.
- FUN_factor
the function to be applied to factor variables.
- FUN_logical
the function to be applied to factor variables.
- FUN_numeric
the function to be applied to numeric variables.
- FUN_integer
the function to be applied to integer variables.
- FUN_other
the function to be applied to other variable types.
- grid_type
character
"typical": variables whose values are not explicitly specified by the user in
...
are set to their mean or mode, or to the output of the functions supplied toFUN_type
arguments."counterfactual": the entire dataset is duplicated for each combination of the variable values specified in
...
. Variables not explicitly supplied todatagrid()
are set to their observed values in the original dataset.
Value
A data.frame
in which each row corresponds to one combination of the named
predictors supplied by the user via the ...
dots. Variables which are not
explicitly defined are held at their mean or mode.
Details
If datagrid
is used in a predictions()
, comparisons()
, or slopes()
call as the
newdata
argument, the model is automatically inserted in the model
argument of datagrid()
call, and users do not need to specify either the model
or newdata
arguments.
If users supply a model, the data used to fit that model is retrieved using
the insight::get_data
function.
Examples
# The output only has 2 rows, and all the variables except `hp` are at their
# mean or mode.
datagrid(newdata = mtcars, hp = c(100, 110))
#> mpg cyl disp drat wt qsec vs am gear
#> 1 20.09062 6.1875 230.7219 3.596563 3.21725 17.84875 0.4375 0.40625 3.6875
#> 2 20.09062 6.1875 230.7219 3.596563 3.21725 17.84875 0.4375 0.40625 3.6875
#> carb hp
#> 1 2.8125 100
#> 2 2.8125 110
# We get the same result by feeding a model instead of a data.frame
mod <- lm(mpg ~ hp, mtcars)
datagrid(model = mod, hp = c(100, 110))
#> mpg hp
#> 1 20.09062 100
#> 2 20.09062 110
# Use in `marginaleffects` to compute "Typical Marginal Effects". When used
# in `slopes()` or `predictions()` we do not need to specify the
#`model` or `newdata` arguments.
slopes(mod, newdata = datagrid(hp = c(100, 110)))
#>
#> Term Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 % hp
#> hp -0.0682 0.0101 -6.74 <0.001 -0.0881 -0.0484 100
#> hp -0.0682 0.0101 -6.74 <0.001 -0.0881 -0.0484 110
#>
#> Columns: rowid, term, estimate, std.error, statistic, p.value, conf.low, conf.high, predicted, predicted_hi, predicted_lo, mpg, hp
#>
# datagrid accepts functions
datagrid(hp = range, cyl = unique, newdata = mtcars)
#> mpg disp drat wt qsec vs am gear carb hp
#> 1 20.09062 230.7219 3.596563 3.21725 17.84875 0.4375 0.40625 3.6875 2.8125 52
#> 2 20.09062 230.7219 3.596563 3.21725 17.84875 0.4375 0.40625 3.6875 2.8125 52
#> 3 20.09062 230.7219 3.596563 3.21725 17.84875 0.4375 0.40625 3.6875 2.8125 52
#> 4 20.09062 230.7219 3.596563 3.21725 17.84875 0.4375 0.40625 3.6875 2.8125 335
#> 5 20.09062 230.7219 3.596563 3.21725 17.84875 0.4375 0.40625 3.6875 2.8125 335
#> 6 20.09062 230.7219 3.596563 3.21725 17.84875 0.4375 0.40625 3.6875 2.8125 335
#> cyl
#> 1 6
#> 2 4
#> 3 8
#> 4 6
#> 5 4
#> 6 8
comparisons(mod, newdata = datagrid(hp = fivenum))
#>
#> Term Contrast Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
#> hp +1 -0.0682 0.0101 -6.74 <0.001 -0.0881 -0.0484
#> hp +1 -0.0682 0.0101 -6.74 <0.001 -0.0881 -0.0484
#> hp +1 -0.0682 0.0101 -6.74 <0.001 -0.0881 -0.0484
#> hp +1 -0.0682 0.0101 -6.74 <0.001 -0.0881 -0.0484
#> hp +1 -0.0682 0.0101 -6.74 <0.001 -0.0881 -0.0484
#>
#> Columns: rowid, term, contrast, estimate, std.error, statistic, p.value, conf.low, conf.high, predicted, predicted_hi, predicted_lo, mpg, hp
#>
# The full dataset is duplicated with each observation given counterfactual
# values of 100 and 110 for the `hp` variable. The original `mtcars` includes
# 32 rows, so the resulting dataset includes 64 rows.
dg <- datagrid(newdata = mtcars, hp = c(100, 110), grid_type = "counterfactual")
nrow(dg)
#> [1] 64
# We get the same result by feeding a model instead of a data.frame
mod <- lm(mpg ~ hp, mtcars)
dg <- datagrid(model = mod, hp = c(100, 110), grid_type = "counterfactual")
nrow(dg)
#> [1] 64