Datasaurus | R Documentation |
The Datasaurus Dozen
Description
An illustrative exercise in never trusting the summary statistics without also visualizing them.
Usage
Datasaurus
Format
A data frame with 1,846 observations on the following 3 variables.
dataset
the particular data set, one of 12
x
a random variable
y
another random variable
Details
Data were created by Alberto Cairo to illustrate you should always
visualize your data beyond the summary statistics. These are 12 data sets,
in long form, each with a mean of x
about 54.26, a mean of y
about 47.83. The standard deviation for x
is about 16.76 and the
standard deviation of y
is about 26.93. x
and y
will
correlate weakly, about -.06.
Author(s)
Alberto Cairo, Justin Matejka, George Fitzmaurice
References
Cairo, Alberto. 2016. “Download the Datasaurus: Never trust summary statistics alone; always visualize your data”. URL: http://www.thefunctionalart.com/2016/08/download-datasaurus-never-trust-summary.html
Matejka, Justin and George Fitzmaurice. 2017. “Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing.” ACM SIGCHI Conference on Human Factors in Computing Systems.