Rdatasets
is a collection of 2337 datasets which were originally distributed alongside the statistical software environment R
and some of its add-on packages. The goal is to make these data more broadly accessible for teaching and statistical software development.
The list of available datasets (csv and docs) is available here:
On the github repository you will also find the scripts I use to scrape data and update the website.
Rdatasets only includes data from packages published on the CRAN repository. Please open an issue on the Github repository if you would like me to add data from a new package.
The code in this repository is licensed under GPL-3.
I believe that the R documentation which I copied to the Rdatasets html folder is licensed under GPL. You will find a copy of the GPL in the Rdatasets github repository.
I made a good faith effort to determine the license under which the actual data (i.e. rows/columns of numbers) were distributed, but I was unable to find a definitive answer. My understanding is that these datasets are free to re-distribute. However, if you own the rights to data that are included here and you object to their inclusion in Rdatasets, send me an email at vincent.arel-bundock@umontreal.ca. I will promptly remove the data in question and will make sure that all traces are erased from the git revision history.