Rdatasets is a collection of 2264 datasets which were originally distributed alongside the statistical software environment R and some of its add-on packages. The goal is to make these data more broadly accessible for teaching and statistical software development.

What is included?

The list of available datasets (csv and docs) is available here:

On the github repository you will also find the scripts I use to scrape data and update the website.

Adding data

Rdatasets only includes data from packages published on the CRAN repository. Please open an issue on the Github repository if you would like me to add data from a new package.

License

The code in this repository is licensed under GPL-3.

I believe that the R documentation which I copied to the Rdatasets html folder is licensed under GPL. You will find a copy of the GPL in the Rdatasets github repository.

I made a good faith effort to determine the license under which the actual data (i.e. rows/columns of numbers) were distributed, but I was unable to find a definitive answer. My understanding is that these datasets are free to re-distribute. However, if you own the rights to data that are included here and you object to their inclusion in Rdatasets, send me an email at . I will promptly remove the data in question and will make sure that all traces are erased from the git revision history.