It’s a very quick post on how to get a list of datasets available from within R with their basic description (what package they can be found in, number of observations and variables). It always takes me some time to find the right dataset to showcase whatever process or method I’m working with, so this was really to make my life easier. So! I’m going to scrape the table with a list of R datasets from here using rvest and xml2 packages:

library(rvest)
library(xml2)
library(dplyr)

url <- "https://vincentarelbundock.github.io/Rdatasets/datasets.html"

r_datasets <- read_html(url) %>% # read url
    html_nodes("table") %>% # extract all the tables
   .[[2]] %>% # it's the second table we want
    html_table() # convert it to a usable format (data.frame)

As a result, we get a tidy data frame…

str(r_datasets)
## 'data.frame':    1072 obs. of  7 variables:
##  $ Package: chr  "datasets" "datasets" "datasets" "datasets" ...
##  $ Item   : chr  "AirPassengers" "BJsales" "BOD" "CO2" ...
##  $ Title  : chr  "Monthly Airline Passenger Numbers 1949-1960" "Sales Data with Leading Indicator" "Biochemical Oxygen Demand" "Carbon Dioxide Uptake in Grass Plants" ...
##  $ Rows   : int  144 150 6 237 6 4 72 84 98 50 ...
##  $ Cols   : int  2 2 2 2 2 4 2 2 2 5 ...
##  $ csv    : chr  "CSV" "CSV" "CSV" "CSV" ...
##  $ doc    : chr  "DOC" "DOC" "DOC" "DOC" ...
library(knitr)
r_datasets %>% 
  select(-c(csv, doc)) %>% 
  head() %>%
  kable()
Package Item Title Rows Cols
datasets AirPassengers Monthly Airline Passenger Numbers 1949-1960 144 2
datasets BJsales Sales Data with Leading Indicator 150 2
datasets BOD Biochemical Oxygen Demand 6 2
datasets CO2 Carbon Dioxide Uptake in Grass Plants 237 2
datasets Formaldehyde Determination of Formaldehyde 6 2
datasets HairEyeColor Hair and Eye Color of Statistics Students 4 4

.. that we can filter freely, according to our needs:

r_datasets %>% filter(Rows >= 1000 & Cols >= 50) %>% kable()
Package Item Title Rows Cols csv doc
Ecdat Car Stated Preferences for Car Choice 4654 70 CSV DOC
psych epi Eysenck Personality Inventory (EPI) data for 3570 participants 3570 57 CSV DOC
psych msq 75 mood items from the Motivational State Questionnaire for 3896 participants 3896 92 CSV DOC
mosaicData HELPfull Health Evaluation and Linkage to Primary Care 1472 788 CSV DOC
ISLR Caravan The Insurance Company (TIC) Benchmark 5822 86 CSV DOC
r_datasets %>% filter(grepl("cat", Item)) %>% kable()
Package Item Title Rows Cols csv doc
boot catsM Weight Data for Domestic Cats 97 3 CSV DOC
robustbase education Education Expenditure Data 50 6 CSV DOC
MASS cats Anatomical Data from Domestic Cats 144 3 CSV DOC
psych cattell 12 cognitive variables from Cattell (1963) 12 12 CSV DOC

This totally maked my life easier, so hope it will help you, too!