vignettes/customization.Rmd
customization.Rmd
When analysts create tables to summarize statistical models, they often want to customize the information that is displayed in those tables (parameters, statistics, significant digits, etc.). The modelsummary
function includes a powerful and intuitive set of arguments which allow users to change the content of their tables.
In addition, analysts often want to customize the appearance of their tables. To achieve this, modelsummary
supports two table making packages: gt
and kableExtra
. These two packages open endless possibilities for customization. Each of them has different strengths and weaknesses. For instance, gt
allows seamless integration with the RStudio IDE, but kableExtra
’s LaTeX (and PDF) output is far more mature. The choice between gt
and kableExtra
should largely depend on the type of output format that users target:
gt
is best for HTML, RTF (MS Word-compatible), JPG, PNG
kableExtra
is best for HTML, LaTeX, Markdown/Text, Rmarkdown PDF
Users are encouraged to read the documentation of both packages to see which syntax they prefer.
modelsummary
can produce tables in a large array of formats. This table shows which package is used by default to create tables in each output format:
Output format | Default package |
---|---|
html | gt |
latex | kableExtra |
markdown | kableExtra |
filename.rtf | gt |
filename.tex | kableExtra |
filename.md | kableExtra |
filename.txt | kableExtra |
filename.png | kableExtra |
filename.jpg | kableExtra |
Rmarkdown PDF | kableExtra |
Rmarkdown HTML | gt |
Both gt
and kableExtra
can produce LaTeX and HTML output. You can override the default settings by setting these global options:
modelsummary
modelsummary
includes a powerful set of utilities to customize the information displayed in your model summary tables. You can easily rename, reorder, subset or omit parameter estimates; choose the set of goodness-of-fit statistics to display; display various “robust” standard errors or confidence intervals; add titles, footnotes, or source notes; insert stars or custom characters to indicate levels of statistical significance; or add rows with supplemental information about your models.
library(modelsummary) library(kableExtra) library(gt) url <- 'https://vincentarelbundock.github.io/Rdatasets/csv/HistData/Guerry.csv' dat <- read.csv(url) models <- list() models[['OLS 1']] <- lm(Donations ~ Literacy + Clergy, data = dat) models[['Poisson 1']] <- glm(Donations ~ Literacy + Commerce, family = poisson, data = dat) models[['OLS 2']] <- lm(Crime_pers ~ Literacy + Clergy, data = dat) models[['Poisson 2']] <- glm(Crime_pers ~ Literacy + Commerce, family = poisson, data = dat) models[['OLS 3']] <- lm(Crime_prop ~ Literacy + Clergy, data = dat)
By default, modelsummary
prints an uncertainty estimate in parentheses below the corresponding coefficient estimate. The value of this estimate is determined by the statistic
argument.
statistic
must be a string which equal to conf.int
or to one of the columns produced by the broom::tidy
function.
msummary(models, statistic = 'std.error') msummary(models, statistic = 'p.value') msummary(models, statistic = 'statistic')
You can display confidence intervals in brackets by setting statistic="conf.int"
:
msummary(models, statistic = 'conf.int', conf_level = .99)
OLS 1 | Poisson 1 | OLS 2 | Poisson 2 | OLS 3 | |
---|---|---|---|---|---|
(Intercept) | 7948.667 | 8.241 | 16259.384 | 9.876 | 11243.544 |
[2469.565, 13427.769] | [8.226, 8.256] | [9375.457, 23143.311] | [9.867, 9.885] | [8577.542, 13909.546] | |
Clergy | 15.257 | 77.148 | -16.376 | ||
[-52.591, 83.105] | [-8.096, 162.392] | [-49.389, 16.637] | |||
Literacy | -39.121 | 0.003 | 3.680 | -0.000 | -68.507 |
[-136.804, 58.562] | [0.003, 0.003] | [-119.048, 126.408] | [-0.000, -0.000] | [-116.037, -20.976] | |
Commerce | 0.011 | 0.001 | |||
[0.011, 0.011] | [0.001, 0.001] | ||||
Num.Obs. | 86 | 86 | 86 | 86 | 86 |
R2 | 0.020 | 0.065 | 0.152 | ||
Adj.R2 | -0.003 | 0.043 | 0.132 | ||
AIC | 1740.8 | 274160.8 | 1780.0 | 257564.4 | 1616.9 |
BIC | 1750.6 | 274168.2 | 1789.9 | 257571.7 | 1626.7 |
Log.Lik. | -866.392 | -137077.401 | -886.021 | -128779.186 | -804.441 |
To display uncertainty estimates next to coefficients instead of below them:
msummary(models, statistic_vertical = FALSE)
You can override the uncertainty estimates in a number of ways. First, you can specify a function that produces variance-covariance matrices:
You can supply a list of functions of the same length as your model list:
You can supply a list of named variance-covariance matrices:
You can supply a list of named vectors:
custom_stats <- list(`OLS 1` = c(`(Intercept)` = 2, Literacy = 3, Clergy = 4), `Poisson 1` = c(`(Intercept)` = 3, Literacy = -5, Commerce = 3), `OLS 2` = c(`(Intercept)` = 7, Literacy = -6, Clergy = 9), `Poisson 2` = c(`(Intercept)` = 4, Literacy = -7, Commerce = -9), `OLS 3` = c(`(Intercept)` = 1, Literacy = -5, Clergy = -2)) msummary(models, statistic_override = custom_stats)
You can also display several different uncertainty estimates below the coefficient estimates. For example,
Will produce something like this:
You can add a title to your table as follows:
msummary(models, title = 'This is a title for my table.')
modelsummary
offers a powerful and innovative mechanism to rename, reorder, and subset coefficients and goodness-of-fit statistics.
The coef_map
argument is a named vector which allows users to rename, reorder, and subset coefficient estimates. Values of this vector correspond to the “clean” variable name. Names of this vector correspond to the “raw” variable name. The table will be sorted in the order in which terms are presented in coef_map
. Coefficients which are not included in coef_map
will be excluded from the table.
cm <- c('Literacy' = 'Literacy (%)', 'Commerce' = 'Patents per capita', '(Intercept)' = 'Constant') msummary(models, coef_map = cm)
OLS 1 | Poisson 1 | OLS 2 | Poisson 2 | OLS 3 | |
---|---|---|---|---|---|
Literacy (%) | -39.121 | 0.003 | 3.680 | -0.000 | -68.507 |
(37.052) | (0.000) | (46.552) | (0.000) | (18.029) | |
Patents per capita | 0.011 | 0.001 | |||
(0.000) | (0.000) | ||||
Constant | 7948.667 | 8.241 | 16259.384 | 9.876 | 11243.544 |
(2078.276) | (0.006) | (2611.140) | (0.003) | (1011.240) | |
Num.Obs. | 86 | 86 | 86 | 86 | 86 |
R2 | 0.020 | 0.065 | 0.152 | ||
Adj.R2 | -0.003 | 0.043 | 0.132 | ||
AIC | 1740.8 | 274160.8 | 1780.0 | 257564.4 | 1616.9 |
BIC | 1750.6 | 274168.2 | 1789.9 | 257571.7 | 1626.7 |
Log.Lik. | -866.392 | -137077.401 | -886.021 | -128779.186 | -804.441 |
An alternative mechanism to subset coefficients is to use the coef_omit
argument. This string is a regular expression which will be fed to stringr::str_detect
to detect the variable names which should be excluded from the table.
msummary(models, coef_omit = 'Intercept|Donation')
gof_omit
is a regular expression which will be fed to stringr::str_detect
to detect the names of the statistics which should be excluded from the table.
msummary(models, gof_omit = 'DF|Deviance|R2|AIC|BIC')
A more powerful mechanism is to supply a data.frame
(or tibble
) through the gof_map
argument. This data.frame must include 4 columns:
raw
: a string with the name of a column produced by broom::glance(model)
.clean
: a string with the “clean” name of the statistic you want to appear in your final table.fmt
: a string which will be used to round/format the string in question (e.g., "%.3f"
). This follows the same standards as the fmt
argument in ?modelsummary
.omit
: TRUE
if you want the statistic to be omitted from your final table.You can see an example of a valid data frame by typing modelsummary::gof_map
. This is the default data.frame that modelsummary
uses to subset and reorder goodness-of-fit statistics. As you can see, omit == TRUE
for quite a number of statistics. You can include setting omit == FALSE
:
The goodness-of-fit statistics will be printed in the table in the same order as in the gof_map
data.frame.
Notice the subtle difference between coef_map
and gof_map
. On the one hand, coef_map
works as a “white list”: any coefficient not explicitly entered will be omitted from the table. On the other, gof_map
works as a “black list”: statistics need to be explicitly marked for omission.
Some people like to add “stars” to their model summary tables to mark statistical significance. The stars
argument can take three types of input:
NULL
omits any stars or special marks (default)TRUE
uses these default values: `* p < 0.1, ** p < 0.05, *** p < 0.01`Whenever stars != FALSE
, modelsummary
adds a note at the bottom of the table automatically. If you would like to omit this note, just use the stars_note
argument:
msummary(models, stars = TRUE, stars_note = FALSE)
If you want to create your own stars description, you can add custom notes with the notes
argument.
The fmt
argument defines how numeric values are rounded and presented in the table. This argument follows the sprintf
C-library standard. For example,
%.3f
will keep 3 digits after the decimal point, including trailing zeros.%.5f
will keep 5 digits after the decimal point, including trailing zeros.f
for an e
will use the exponential decimal representation.Most users will just modify the 3
in %.3f
, but this is a very powerful system, and all users are encouraged to read the details: ?sprintf
msummary(models, fmt = '%.7f')
Use the add_rows
argument to add rows manually to the bottom of the table.
row1 <- c('Custom row 1', 'a', 'b', 'c', 'd', 'e') row2 <- c('Custom row 2', 5:1) msummary(models, add_rows = list(row1, row2))
Use the add_rows
argument to specify where the custom rows should be displayed in the bottom panel. For example, this prints custom rows after the coefficients, but at first position in the goodness of fit measures:
This prints custom rows after the 2nd GOF statistic:
Users can pass any additional argument they want to the tidy
method which is used to extract estimates from a model. For example, in logitistic or Cox proportional hazard models, many users want to exponentiate coefficients to faciliate interpretation. The tidy
functions supplied by the broom
package allow users to set exponentiate=TRUE
to achieve this. In modelsummary
, users can use the same argument:
mod_logit <- glm(am ~ mpg, data = mtcars, family = binomial) msummary(mod_logit, exponentiate = TRUE)
Any argument supported by tidy
is thus supported by modelsummary
.
Warning: at the moment (2020-05-05), broom::tidy
still reports std.error
on the original scale. See this discussion on the broom
GitHub page.
Warning: When users supply a file name to the output
argument, the table is written immediately to file. This means that users cannot post-process and customize the resulting table using functions from gt
or kableExtra
. To save a customized table, you should apply all the customization functions you need before saving it using gt::gtsave
, kableExtra::save_kable
, or another appropriate helper function.
gt
Thanks to gt
, modelsummary
accepts markdown indications for emphasis and more:
msummary(models, title = md('This is a **bolded series of words.**'), notes = list(md('And an *emphasized note*.')))
We can modify the size of the text with gt
’s tab_style
function:
msummary(models) %>% tab_style(style = cell_text(size = 'x-large'), locations = cells_body(columns = 1))
We can also color columns and cells, and present values in bold or italics:
msummary(models) %>% tab_style(style = cell_fill(color = "lightcyan"), locations = cells_body(columns = vars(`OLS 1`))) %>% tab_style(style = cell_fill(color = "#F9E3D6"), locations = cells_body(columns = vars(`Poisson 2`), rows = 2:6)) %>% tab_style(style = cell_text(weight = "bold"), locations = cells_body(columns = vars(`OLS 1`))) %>% tab_style(style = cell_text(style = "italic"), locations = cells_body(columns = vars(`Poisson 2`), rows = 2:6))
OLS 1 | Poisson 1 | OLS 2 | Poisson 2 | OLS 3 | |
---|---|---|---|---|---|
(Intercept) | 7948.667 | 8.241 | 16259.384 | 9.876 | 11243.544 |
(2078.276) | (0.006) | (2611.140) | (0.003) | (1011.240) | |
Clergy | 15.257 | 77.148 | -16.376 | ||
(25.735) | (32.334) | (12.522) | |||
Literacy | -39.121 | 0.003 | 3.680 | -0.000 | -68.507 |
(37.052) | (0.000) | (46.552) | (0.000) | (18.029) | |
Commerce | 0.011 | 0.001 | |||
(0.000) | (0.000) | ||||
Num.Obs. | 86 | 86 | 86 | 86 | 86 |
R2 | 0.020 | 0.065 | 0.152 | ||
Adj.R2 | -0.003 | 0.043 | 0.132 | ||
AIC | 1740.8 | 274160.8 | 1780.0 | 257564.4 | 1616.9 |
BIC | 1750.6 | 274168.2 | 1789.9 | 257571.7 | 1626.7 |
Log.Lik. | -866.392 | -137077.401 | -886.021 | -128779.186 | -804.441 |
Create spanning labels to group models (columns):
msummary(models) %>% tab_spanner(label = 'Literacy', columns = c('OLS 1', 'Poisson 1')) %>% tab_spanner(label = 'Desertion', columns = c('OLS 2', 'Poisson 2')) %>% tab_spanner(label = 'Clergy', columns = 'OLS 3')
Insert images in your tables using the gt::text_transform
and gt::local_image
functions.
f <- function(x) web_image(url = "https://user-images.githubusercontent.com/987057/82732352-b9aabf00-9cda-11ea-92a6-26750cf097d0.png", height = 80) msummary(models) %>% text_transform(locations = cells_body(columns = 2:6, rows = 1), fn = f)
OLS 1 | Poisson 1 | OLS 2 | Poisson 2 | OLS 3 | |
---|---|---|---|---|---|
(Intercept) | |||||
(2078.276) | (0.006) | (2611.140) | (0.003) | (1011.240) | |
Clergy | 15.257 | 77.148 | -16.376 | ||
(25.735) | (32.334) | (12.522) | |||
Literacy | -39.121 | 0.003 | 3.680 | -0.000 | -68.507 |
(37.052) | (0.000) | (46.552) | (0.000) | (18.029) | |
Commerce | 0.011 | 0.001 | |||
(0.000) | (0.000) | ||||
Num.Obs. | 86 | 86 | 86 | 86 | 86 |
R2 | 0.020 | 0.065 | 0.152 | ||
Adj.R2 | -0.003 | 0.043 | 0.132 | ||
AIC | 1740.8 | 274160.8 | 1780.0 | 257564.4 | 1616.9 |
BIC | 1750.6 | 274168.2 | 1789.9 | 257571.7 | 1626.7 |
Log.Lik. | -866.392 | -137077.401 | -886.021 | -128779.186 | -804.441 |
This is the code I used to generate the “complex” table posted at the top of this README.
cm <- c('Literacy' = 'Literacy (%)', 'Clergy' = 'Priests/capita', 'Commerce' = 'Patents/capita', 'Infants' = 'Infants', '(Intercept)' = 'Constant') msummary(models, coef_map = cm, stars = TRUE, gof_omit = "Deviance", title = 'modelsummary package for R', notes = c('The most important parameter is printed in red.')) %>% tab_spanner(label = 'Donations', columns = 2:3) %>% tab_spanner(label = 'Crimes (persons)', columns = 4:5) %>% tab_spanner(label = 'Crimes (property)', columns = 6) %>% tab_footnote(footnote = md("Very **important** variable."), locations = cells_body(rows = 3, columns = 1)) %>% tab_style(style = cell_text(color = 'red'), locations = cells_body(rows = 3, columns = 4))
kableExtra
Note that compiling this LaTeX table requires loading the booktabs
and xcolor
packages in the preamble of your LaTeX or Rmarkdown document.
The gt
LaTeX render engine is still immature. Until it improves, I strongly recommend that users turn to kableExtra
to produce LaTeX tables. This package offers robust functions that allow a lot of customization. A simple LaTeX table can be produced as follows:
msummary(models, output = 'latex')
We can use functions from the kableExtra
package to customize this table, with bold and colored cells, column spans, and more.
The row_spec
and column_spec
allow users to change the styling of their tables. For instance, this code creates a table where the first column is in bold blue text on pink background:
You can define column group labels using kableExtra
’s add_header_above
function:
msummary(models, output = 'latex') %>%
add_header_above(c(" " = 1,
"Donations" = 2,
"Crimes (person)" = 2,
"Crimes (property)" = 1))
cm <- c('Literacy' = 'Literacy (%)', 'Clergy' = 'Priests/capita', 'Commerce' = 'Patents/capita', 'Infants' = 'Infants', '(Intercept)' = 'Constant') msummary(models, coef_map = cm, stars = TRUE, gof_omit = "Deviance", title = 'modelsummary package for R', notes = c('First custom note to contain text.', 'Second custom note with different content.')) %>% add_header_above(c(" " = 1, "Donations" = 2, "Crimes (person)" = 2, "Crimes (property)" = 1)) row_spec(3, bold = TRUE, color = 'blue', background = 'pink')