Skip to contents

This function was inspired by the excellent skimr package for R. See the Details and Examples sections below, and the vignettes on the modelsummary website:

  • https://modelsummary.com/

  • https://modelsummary.com/articles/datasummary.html

Usage

datasummary_skim(
  data,
  type = "numeric",
  output = "default",
  fmt = "%.1f",
  histogram = TRUE,
  title = NULL,
  notes = NULL,
  align = NULL,
  escape = TRUE,
  ...
)

Arguments

data

A data.frame (or tibble)

type

of variables to summarize: "numeric" or "categorical" (character)

output

filename or object type (character string)

  • Supported filename extensions: .docx, .html, .tex, .md, .txt, .csv, .xlsx, .png, .jpg

  • Supported object types: "default", "html", "markdown", "latex", "latex_tabular", "data.frame", "gt", "kableExtra", "huxtable", "flextable", "DT", "jupyter". The "modelsummary_list" value produces a lightweight object which can be saved and fed back to the modelsummary function.

  • The "default" output format can be set to "kableExtra", "gt", "flextable", "huxtable", "DT", or "markdown"

    • If the user does not choose a default value, the packages listed above are tried in sequence.

    • Session-specific configuration: options("modelsummary_factory_default" = "gt")

    • Persistent configuration: config_modelsummary(output = "markdown")

  • Warning: Users should not supply a file name to the output argument if they intend to customize the table with external packages. See the 'Details' section.

  • LaTeX compilation requires the booktabs and siunitx packages, but siunitx can be disabled or replaced with global options. See the 'Details' section.

fmt

how to format numeric values: integer, user-supplied function, or modelsummary function.

  • Integer: Number of decimal digits

  • User-supplied functions:

    • Any function which accepts a numeric vector and returns a character vector of the same length.

  • modelsummary functions:

    • fmt = fmt_significant(2): Two significant digits (at the term-level)

    • fmt = fmt_sprintf("%.3f"): See ?sprintf

    • fmt = fmt_identity(): unformatted raw values

histogram

include a histogram (TRUE/FALSE). Supported for:

  • type = "numeric"

  • output is "html", "default", "jpg", "png", or "kableExtra"

  • PDF and HTML documents compiled via Rmarkdown or knitr

  • See the examples section below for an example of how to use datasummary to include histograms in other formats such as markdown.

title

string

notes

list or vector of notes to append to the bottom of the table.

align

A string with a number of characters equal to the number of columns in the table (e.g., align = "lcc"). Valid characters: l, c, r, d.

  • "l": left-aligned column

  • "c": centered column

  • "r": right-aligned column

  • "d": dot-aligned column. For LaTeX/PDF output, this option requires at least version 3.0.25 of the siunitx LaTeX package. These commands must appear in the LaTeX preamble (they are added automatically when compiling Rmarkdown documents to PDF):

    • \usepackage{booktabs}

    • \usepackage{siunitx}

    • \newcolumntype{d}{S[ input-open-uncertainty=, input-close-uncertainty=, parse-numbers = false, table-align-text-pre=false, table-align-text-post=false ]}

escape

boolean TRUE escapes or substitutes LaTeX/HTML characters which could prevent the file from compiling/displaying. This setting does not affect captions or notes.

...

all other arguments are passed through to the table-making functions kableExtra::kbl, gt::gt, DT::datatable, etc. depending on the output argument. This allows users to pass arguments directly to datasummary in order to affect the behavior of other functions behind the scenes.

Global Options

The behavior of modelsummary can be modified by setting global options. For example:

  • options(modelsummary_model_labels = "roman")

The rest of this section describes each of the options above.

Model labels: default column names

These global option changes the style of the default column headers:

  • options(modelsummary_model_labels = "roman")

  • options(modelsummary_panel_labels = "roman")

The supported styles are: "model", "panel", "arabic", "letters", "roman", "(arabic)", "(letters)", "(roman)""

The panel-specific option is only used when shape="rbind"

Table-making packages

modelsummary supports 4 table-making packages: kableExtra, gt, flextable, huxtable, and DT. Some of these packages have overlapping functionalities. For example, 3 of those packages can export to LaTeX. To change the default backend used for a specific file format, you can use the options function:

options(modelsummary_factory_html = 'kableExtra')

options(modelsummary_factory_latex = 'gt')

options(modelsummary_factory_word = 'huxtable')

options(modelsummary_factory_png = 'gt')

Table themes

Change the look of tables in an automated and replicable way, using the modelsummary theming functionality. See the vignette: https://modelsummary.com/articles/appearance.html

  • modelsummary_theme_gt

  • modelsummary_theme_kableExtra

  • modelsummary_theme_huxtable

  • modelsummary_theme_flextable

  • modelsummary_theme_dataframe

Model extraction functions

modelsummary can use two sets of packages to extract information from statistical models: the easystats family (performance and parameters) and broom. By default, it uses easystats first and then falls back on broom in case of failure. You can change the order of priorities or include goodness-of-fit extracted by both packages by setting:

options(modelsummary_get = "broom")

options(modelsummary_get = "easystats")

options(modelsummary_get = "all")

Formatting numeric entries

By default, LaTeX tables enclose all numeric entries in the \num{} command from the siunitx package. To prevent this behavior, or to enclose numbers in dollar signs (for LaTeX math mode), users can call:

options(modelsummary_format_numeric_latex = "plain")

options(modelsummary_format_numeric_latex = "mathmode")

A similar option can be used to display numerical entries using MathJax in HTML tables:

options(modelsummary_format_numeric_html = "mathjax")

Examples

dat <- mtcars
dat$vs <- as.logical(dat$vs)
dat$cyl <- as.factor(dat$cyl)
datasummary_skim(dat)
datasummary_skim(dat, "categorical")

# You can use `datasummary` to produce a similar table in different formats.
# Note that the `Histogram` function relies on unicode characters. These
# characters will only display correctly in some operating systems, under some
# locales, using some fonts. Displaying such histograms on Windows computers
# is notoriously tricky. The `modelsummary` authors cannot provide support to
# display these unicode histograms.

f <- All(mtcars) ~ Mean + SD + Min + Median + Max + Histogram
# datasummary(f, mtcars, output="markdown")

References

Arel-Bundock V (2022). “modelsummary: Data and Model Summaries in R.” Journal of Statistical Software, 103(1), 1-23. doi:10.18637/jss.v103.i01 .'