Generate a correlation table for all numeric variables in your dataset.
Source:R/datasummary_correlation.R
datasummary_correlation.Rd
The names of the variables displayed in the correlation table are the names
of the columns in the data
. You can rename those columns (with or without
spaces) to produce a table of human-readable variables. See the Details and
Examples sections below, and the vignettes on the modelsummary
website:
https://vincentarelbundock.github.io/modelsummary/
https://vincentarelbundock.github.io/modelsummary/articles/datasummary.html
Usage
datasummary_correlation(
data,
output = "default",
method = "pearson",
fmt = 2,
align = NULL,
add_rows = NULL,
add_columns = NULL,
title = NULL,
notes = NULL,
escape = TRUE,
...
)
Arguments
- data
A data.frame (or tibble)
- output
filename or object type (character string)
Supported filename extensions: .docx, .html, .tex, .md, .txt, .png, .jpg.
Supported object types: "default", "html", "markdown", "latex", "latex_tabular", "data.frame", "gt", "kableExtra", "huxtable", "flextable", "jupyter". The "modelsummary_list" value produces a lightweight object which can be saved and fed back to the
modelsummary
function.Warning: Users should not supply a file name to the
output
argument if they intend to customize the table with external packages. See the 'Details' section.LaTeX compilation requires the
booktabs
andsiunitx
packages, butsiunitx
can be disabled or replaced with global options. See the 'Details' section.The default output formats and table-making packages can be modified with global options. See the 'Details' section.
- method
character or function
character: "pearson", "kendall", "spearman", or "pearspear" (Pearson correlations above and Spearman correlations below the diagonal)
function: takes a data.frame with numeric columns and returns a square matrix or data.frame with unique row.names and colnames corresponding to variable names. Note that the
datasummary_correlation_format
can often be useful for formatting the output of custom correlation functions.
- fmt
determines how to format numeric values
integer: the number of digits to keep after the period
format(round(x, fmt), nsmall=fmt)
character: passed to the
sprintf
function (e.g., '%.3f' keeps 3 digits with trailing zero). See?sprintf
function: returns a formatted character string.
NULL: does not format numbers, which allows users to include function in the "glue" strings in the
estimate
andstatistic
arguments.
- align
A string with a number of characters equal to the number of columns in the table (e.g.,
align = "lcc"
). Valid characters: l, c, r, d."l": left-aligned column
"c": centered column
"r": right-aligned column
"d": dot-aligned column. Only supported for LaTeX/PDF tables produced by
kableExtra
. These commands must appear in the LaTeX preamble (they are added automatically when compiling Rmarkdown documents to PDF):\usepackage{booktabs}
\usepackage{siunitx}
\newcolumntype{d}{S[input-symbols = ()]}
- add_rows
a data.frame (or tibble) with the same number of columns as your main table. By default, rows are appended to the bottom of the table. You can define a "position" attribute of integers to set the row positions. See Examples section below.
- add_columns
a data.frame (or tibble) with the same number of rows as your main table.
- title
string
- notes
list or vector of notes to append to the bottom of the table.
- escape
boolean TRUE escapes or substitutes LaTeX/HTML characters which could prevent the file from compiling/displaying. This setting does not affect captions or notes.
- ...
other parameters are passed through to the table-making packages.
Global Options
The behavior of modelsummary
can be affected by setting global options:
modelsummary_factory_default
modelsummary_factory_latex
modelsummary_factory_html
modelsummary_factory_png
modelsummary_get
modelsummary_format_numeric_latex
modelsummary_format_numeric_html
Table-making packages
modelsummary
supports 4 table-making packages: kableExtra
, gt
,
flextable
, and huxtable
. Some of these packages have overlapping
functionalities. For example, 3 of those packages can export to LaTeX. To
change the default backend used for a specific file format, you can use
the options
function:
options(modelsummary_factory_html = 'kableExtra')
options(modelsummary_factory_latex = 'gt')
options(modelsummary_factory_word = 'huxtable')
options(modelsummary_factory_png = 'gt')
Model extraction functions
modelsummary
can use two sets of packages to extract information from
statistical models: the easystats
family (performance
and parameters
)
and broom
. By default, it uses easystats
first and then falls back on
broom
in case of failure. You can change the order of priorities or include
goodness-of-fit extracted by both packages by setting:
options(modelsummary_get = "broom")
options(modelsummary_get = "easystats")
options(modelsummary_get = "all")
Formatting numeric entries
By default, LaTeX tables enclose all numeric entries in the \num{}
command
from the siunitx package. To prevent this behavior, or to enclose numbers
in dollar signs (for LaTeX math mode), users can call:
options(modelsummary_format_numeric_latex = "plain")
options(modelsummary_format_numeric_latex = "mathmode")
A similar option can be used to display numerical entries using MathJax in HTML tables:
options(modelsummary_format_numeric_html = "mathjax")
References
Arel-Bundock V (2022). “modelsummary: Data and Model Summaries in R.” Journal of Statistical Software, 103(1), 1-23. doi:10.18637/jss.v103.i01 .'
Examples
if (FALSE) {
library(modelsummary)
# clean variable names (base R)
dat <- mtcars[, c("mpg", "hp")]
colnames(dat) <- c("Miles / Gallon", "Horse Power")
datasummary_correlation(dat)
# clean variable names (tidyverse)
library(tidyverse)
dat <- mtcars %>%
select(`Miles / Gallon` = mpg,
`Horse Power` = hp)
datasummary_correlation(dat)
# alternative methods
datasummary_correlation(dat, method = "pearspear")
# custom function
cor_fun <- function(x) cor(x, method = "kendall")
datasummary_correlation(dat, method = cor_fun)
# rename columns alphabetically and include a footnote for reference
note <- sprintf("(%s) %s", letters[1:ncol(dat)], colnames(dat))
note <- paste(note, collapse = "; ")
colnames(dat) <- sprintf("(%s)", letters[1:ncol(dat)])
datasummary_correlation(dat, notes = note)
# `datasummary_correlation_format`: custom function with formatting
dat <- mtcars[, c("mpg", "hp", "disp")]
cor_fun <- function(x) {
out <- cor(x, method = "kendall")
datasummary_correlation_format(
out,
fmt = 2,
upper_triangle = "x",
diagonal = ".")
}
datasummary_correlation(dat, method = cor_fun)
# use kableExtra and psych to color significant cells
library(psych)
library(kableExtra)
dat <- mtcars[, c("vs", "hp", "gear")]
cor_fun <- function(dat) {
# compute correlations and format them
correlations <- data.frame(cor(dat))
correlations <- datasummary_correlation_format(correlations, fmt = 2)
# calculate pvalues using the `psych` package
pvalues <- psych::corr.test(dat)$p
# use `kableExtra::cell_spec` to color significant cells
for (i in 1:nrow(correlations)) {
for (j in 1:ncol(correlations)) {
if (pvalues[i, j] < 0.05 && i != j) {
correlations[i, j] <- cell_spec(correlations[i, j], background = "pink")
}
}
}
return(correlations)
}
# The `escape=FALSE` is important here!
datasummary_correlation(dat, method = cor_fun, escape = FALSE)
}