library(modelsummary)

Adding new models

modelsummary relies on two functions from the broom package to extract model information: tidy and glance. If broom doesn’t support the type of model you are trying to summarize, modelsummary won’t support it out of the box. Thankfully, it is extremely easy to add support for most models using custom methods.

For example, models produced by the MCMCglmm package are not currently supported by broom. To add support, you simply need to create a tidy and a glance method:

# load packages and data
library(modelsummary)
library(MCMCglmm)
data(PlodiaPO)

# add custom functions to extract estimates (tidy) and goodness-of-fit (glance) information
tidy.MCMCglmm <- function(object, ...) {
    s <- summary(object, ...)
    ret <- tibble::tibble(term = row.names(s$solutions),
                          estimate = s$solutions[, 1],
                          conf.low = s$solutions[, 2],
                          conf.high = s$solutions[, 3])
    ret
}
glance.MCMCglmm <- function(object, ...) {
    ret <- tibble::tibble(dic = object$DIC,
                          n = nrow(object$X))
    ret
}

# estimate a simple model
model <- MCMCglmm(PO ~ 1 + plate, random = ~ FSfamily, data = PlodiaPO, verbose=FALSE, pr=TRUE)

# summarize the model
msummary(model, statistic = 'conf.int')

Two important things to note. First, the methods are named tidy.MCMCglmm and glance.MCMCglmm because the model object I am trying to summarize is of class MCMCglmm. You can find the class of a model by running: class(model).

Second, in the example above, we used the statistic = 'conf.int' argument. This is because the tidy method produces conf.low and conf.high columns. In most cases, users will define std.error column in their custom tidy methods, so the statistic argument will need to be adjusted.

If you create new tidy and glance methods, please consider contributing them to broom so that the rest of the community can benefit from your work: https://github.com/tidymodels/broom

Customizing existing models

The set of available statistics for any given model is determined by the output of the broom::glance function. Sometimes, you want to include information that is not included in glance output. For instance, you may want to display whether certain regression models include “fixed effects”.

One way to circumvent the limitations of glance is to create a custom method for your models. To illustrate, we load the lfe package, which provides the felm function to estimate linear regression models with fixed effects. Then, we simulate some data:

library(lfe)
library(broom)
library(modelsummary)

x <- rnorm(1000)
x2 <- rnorm(length(x))
id <- factor(sample(20,length(x),replace=TRUE))
firm <- factor(sample(13,length(x),replace=TRUE))
id.eff <- rnorm(nlevels(id))
firm.eff <- rnorm(nlevels(firm))
u <- rnorm(length(x))
y <- x + 0.5*x2 + id.eff[id] + firm.eff[firm] + u

The models produced by felm store the names of fixed effects in an object called “fe”. For example, if x is your models, then x$fe will be a character vector with the names of all your fixed effects variables.

We create a custom glance method that extracts this information and returns it alongside the regular glance output:

glance.felm_custom <- function(x, ...) {
    out <- broom:::glance.felm(x, ...)
    for (fe in names(x$fe)) {
        out[[paste('Fixed effects:', fe)]] <- 'X'
    }
    out
}

Then, we create a function to fit our models. This function passes all its arguments forward to felm using ellipses (...). Then, it assigns a new class name. This tells modelsummary that it should extract fixed effects names:

felm_custom <- function(...) {
    out <- felm(...)
    class(out) <- c('felm_custom', class(out))
    out
}

Finally, we fit models and print the table:

mod <- list()
mod[[1]] <- lm(y ~ x + x2)
mod[[2]] <- felm_custom(y ~ x + x2 | id)
mod[[3]] <- felm_custom(y ~ x + x2 | firm)
mod[[4]] <- felm_custom(y ~ x + x2 | id + firm)

msummary(mod)
Model 1 Model 2 Model 3 Model 4
(Intercept) -0.450
(0.056)
x 1.023 0.982 1.051 1.013
(0.055) (0.048) (0.044) (0.032)
x2 0.529 0.553 0.509 0.530
(0.057) (0.050) (0.045) (0.034)
Num.Obs. 1000 1000 1000 1000
R2 0.300 0.493 0.570 0.774
Adj.R2 0.298 0.482 0.564 0.766
AIC 3968.6 3683.7 3504.0 2900.4
BIC 3988.2 3796.6 3582.5 3072.2
Log.Lik. -1980.289 -1818.849 -1735.999 -1415.217
Fixed effects: firm X X
Fixed effects: id X X

Raw data

The gt package allows a bunch more customization and styling. Power users can use modelsummary’s extract function to produce a tibble which can easily be fed into gt.

> modelsummary::extract(models)
# A tibble: 21 x 8
   group     term        statistic `OLS 1` `NBin 1` `OLS 2` `NBin 2` `Logit 1`
   <chr>     <chr>       <chr>     <chr>   <chr>    <chr>   <chr>    <chr>
 1 estimates (Intercept) estimate  64.114  4.218    57.331  4.384    1.006
 2 estimates (Intercept) statistic (5.247) (0.144)  (8.315) (0.233)  (0.710)
 3 estimates Crime_prop  estimate  -0.002  -0.000   -0.002  -0.000   -0.000
 4 estimates Crime_prop  statistic (0.001) (0.000)  (0.001) (0.000)  (0.000)
 5 estimates Infants     estimate  -0.001  ""       0.000   ""       -0.000
 6 estimates Infants     statistic (0.000) ""       (0.000) ""       (0.000)
 7 estimates Donations   estimate  ""      -0.000   ""      -0.000   ""
 8 estimates Donations   statistic ""      (0.000)  ""      (0.000)  ""
 9 gof       R2          ""        0.237   ""       0.073   ""       ""
10 gof       Adj.R2      ""        0.218   ""       0.051   ""       ""
# … with 11 more rows

Multiple imputation results

modelsummary can pool and display analyses on several datasets imputed using the mice or Amelia packages. In this example, we use convenience functions from the mitools package to format our data:

library(modelsummary)
library(mitools)
suppressMessages(library(mice))
suppressMessages(library(Amelia))

# Download data from `Rdatasets`
url <- 'https://vincentarelbundock.github.io/Rdatasets/csv/HistData/Guerry.csv'
dat <- read.csv(url)[, c('Clergy', 'Commerce', 'Literacy')]

# Insert missing values
dat$Clergy[sample(1:nrow(dat), 10)] <- NA
dat$Commerce[sample(1:nrow(dat), 10)] <- NA
dat$Literacy[sample(1:nrow(dat), 10)] <- NA

# Impute with `mice` and `Amelia`
dat_mice <- mice(dat, m = 5, printFlag = FALSE)
dat_amelia <- amelia(dat, m = 5, p2s = 0)
dat_amelia <- imputationList(dat_amelia$imputations)

# Estimate models
mod <- list()
mod[['Listwise deletion']] <- lm(Clergy ~ Literacy + Commerce, dat)
mod[['Mice']] <- with(dat_mice, lm(Clergy ~ Literacy + Commerce))
mod[['Amelia']] <- MIcombine(with(dat_amelia, lm(Clergy ~ Literacy + Commerce)))

# Summarize
msummary(mod)
Listwise deletion Mice Amelia
(Intercept) 81.782 82.221 76.650
(14.993) (15.534) (12.959)
Commerce -0.409 -0.334 -0.321
(0.158) (0.156) (0.141)
Literacy -0.546 -0.621 -0.527
(0.240) (0.251) (0.203)
Num.Obs. 60 86
R2 0.118 0.129
Adj.R2 0.087 0.107
AIC 556.9
BIC 565.3
Log.Lik. -274.450
nimp 5 5