| anscombe_quartet | R Documentation |
Anscombe's Quartet Data
Description
This dataset contains 44 observations, 11 observations from 4 datasets
generated by Francis Anscombe to demonstrate that statistical summary
measures alone cannot capture the full relationship between two variables
(here, x and y). Anscombe emphasized the importance of visualizing data
prior to calculating summary statistics.
Usage
anscombe_quartet
Format
A dataframe with 44 rows and 3 variables:
-
dataset: the dataset the values come from -
x: the x-variable -
y: the y-variable
Details
Dataset 1 has a linear relationship between
xandyDataset 2 has shows a nonlinear relationship between
xandyDataset 3 has a linear relationship between
xandywith a single outlierDataset 4 has shows no relationship between
xandywith a single outlier that serves as a high-leverage point.
In each of the datasets the following statistical summaries hold:
mean of
x: 9variance of
x: 11mean of
y: 7.5variance of y: 4.125
correlation between
xandy: 0.816linear regression between
xandy:y = 3 + 0.5x-
R^2for the regression: 0.67
References
Anscombe, F. J. (1973). "Graphs in Statistical Analysis". American Statistician. 27 (1): 17–21. doi:10.1080/00031305.1973.10478966. JSTOR 2682899.