Everybody Loves Raymond: Running Animal Crossing Villagers through the Google Vision API

2021-06-06

Animal Crossing: New Horizons kept me sane throughout the first Melbourne COVID lockdown. Now, in lockdown 4, it seems right that I should look back at this cheerful, relaxing game and do some data stuff. I’m going to take the Animal Crossing villagers in the Tidy Tuesday Animal Crossing dataset and combine it with survey data from the Animal Crossing Portal, giving each villager a measure of popularity. I’ll use the Google Cloud Vision API to annotate each of the villager thumbnails, and with these train a a (pretty poor) model of villager popularity.

library(tidyverse)
library(tidymodels)
library(glue)
library(httr)
library(ggimage)
library(patchwork)
library(lime)

Retrieve the villager popularity votes

The Animal Crossing Portal is a fan site that runs a monthly poll on favourite villagers. They keep historical data in publicly available Google Sheets, which makes a data scientist like me very happy.

The sheet is a list of votes, but two columns to the side tally the total votes for each villager. That leaves a lot of dangling empty rows. I’ll grab those two columns and delete the empty rows.

popularity_url <- "https://docs.google.com/spreadsheets/d/1ADak5KpVYjeSRNN4qudYERMotPkeRP5n4rN_VpOQm4Y/edit#gid=0"
googlesheets4::gs4_deauth() # disable authentication for this public sheet

popularity <- googlesheets4::read_sheet(popularity_url) %>%
    transmute( # transmute combines mutate and select
        name = Villagers,
        popularity = Tally
    ) %>%
    na.omit()
#> Reading from "April 2021 Poll Final Vote Count"
#> Range "Sheet1"
#> New names:
#> * `` -> ...7
#> * `` -> ...10

popularity %>% arrange(-popularity) %>% head()
#> # A tibble: 6 x 2
#>   name    popularity
#>   <chr>        <dbl>
#> 1 Marshal        725
#> 2 Raymond        656
#> 3 Sherb          579
#> 4 Zucker         558
#> 5 Judy           421
#> 6 Fauna          407

Retrieve the Tidy Tuesday villager data

I always come late to the Tidy Tuesday party. This is the dataset from 2020-05-05. It contains a data frame of every villager available in Animal Crossing: New Horizons (at the time), with their gender, species, and a few other attributes. It also contains a url column pointing to a thumbnail of the villager — I’ll use this later when I’m querying the Vision API.

tidy_tuesday_data <- tidytuesdayR::tt_load("2020-05-05")
#> 
#>   Downloading file 1 of 4: `critic.tsv`
#>   Downloading file 2 of 4: `items.csv`
#>   Downloading file 3 of 4: `user_reviews.tsv`
#>   Downloading file 4 of 4: `villagers.csv`
tidy_tuesday_villagers <- tidy_tuesday_data$villagers
tidy_tuesday_villagers %>% head()
#> # A tibble: 6 x 11
#>   row_n id     name   gender species birthday personality song   phrase full_id 
#>   <dbl> <chr>  <chr>  <chr>  <chr>   <chr>    <chr>       <chr>  <chr>  <chr>   
#> 1     2 admir… Admir… male   bird    1-27     cranky      Steep… aye a… village…
#> 2     3 agent… Agent… female squirr… 7-2      peppy       DJ K.… sidek… village…
#> 3     4 agnes  Agnes  female pig     4-21     uchi        K.K. … snuff… village…
#> 4     6 al     Al     male   gorilla 10-18    lazy        Steep… Ayyee… village…
#> 5     7 alfon… Alfon… male   alliga… 6-9      lazy        Fores… it'sa… village…
#> 6     8 alice  Alice  female koala   8-19     normal      Surfi… guvnor village…
#> # … with 1 more variable: url <chr>

Running assertions against datasets is a good idea. I’ll check that I have a popularity score for every villager. There are villagers in the popularity data that aren’t in the Tidy Tuesday data, but this is to be expected as new characters have been released in the time since the Tidy Tuesday data set was published. I’ll also check that there are no missing values in columns that I care about — there are missing values for the villagers' favourite songs, but I don’t need that information.

tidy_tuesday_villagers %>%
  anti_join(popularity, by = "name") %>%   
  {assertthat::assert_that(nrow(.) == 0)}
#> [1] TRUE
tidy_tuesday_villagers %>% 
  select(-song) %>% 
  complete.cases() %>% 
  all() %>% 
  assertthat::assert_that()
#> [1] TRUE

With those checks done, I can safely join:

villagers <- tidy_tuesday_villagers %>% left_join(popularity, by = "name")

This data is fun to plot

Those thumbnails add a bit of flair to any plot. It should come as no surprise to any Animal Crossing fan that Marshal is the favourite:

villagers %>% 
  arrange(-popularity) %>% 
  head(10) %>% 
  mutate(name = factor(name, levels = name)) %>% 
  ggplot(aes(x = name, y = popularity, fill = name)) +
  geom_bar(stat = "identity") + 
  geom_image(
    aes(x = name, y = popularity - 70, image = url),
    size = 0.07
  ) +
  ggtitle("Marshal is the most popular villager") +
  theme(
    text = element_text(size = 16),
    legend.position = "none",
    axis.title.x = element_blank(),
    axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1),
    aspect.ratio = 1
  )

Animal Crossing villagers are sorted into 35 different species. Some are more loved than others. The popularity densities have long tails, so taking the log here makes them plot a lot better:

villagers %>% 
    filter(species %in% c("cat", "chicken", "squirrel")) %>% 
    ggplot(aes(x = log(popularity), group = species, fill = species)) + 
    geom_density(alpha = 0.4) +
    theme(text = element_text(size = 16)) +
    ggtitle("Cats are more popular than chickens")

Octopuses are particularly loved, though. There are only 3 octopus villagers, but their mean popularity is 366, as opposed to the overall mean popularity of 57. People really like Zucker!

Authenticating with Google Cloud

By this point I’ve already set up an account and project with the Google Cloud Platform (GCP), and enabled the relevant APIs. I won’t go into that detail here, since the GCP documentation is pretty good. However, I still need to authenticate myself to the GCP before I can use any of its services.

There’s no all-encompassing R SDK for the Google Cloud Platform. A few services can be used with packages provided by the CloudyR project, but there’s nothing for the Vision API. I’m happy to use Google’s HTTP APIs directly, but the authentication usually trips me up. Fortunately, the gargle package is excellent, and makes the authentication much simpler than it would be to do it manually.

Following the instructions provided by Google, I created a service account with read/write access to Cloud Storage and permissions to use the Vision API. The actual credentials are kept in a JSON. Within my .Renviron file (hint: usethis::edit_r_environ() will open this in RStudio) I set the “GOOGLE_APPLICATION_CREDENTIALS” environment variable to the path of this JSON. Now, I can use the gargle package to create a token with the appropriate scopes:

gcp_token <- gargle::credentials_service_account(
  scopes = c(
    "https://www.googleapis.com/auth/cloud-vision",
    "https://www.googleapis.com/auth/devstorage.read_write"
  ),
  path = Sys.getenv("GOOGLE_APPLICATION_CREDENTIALS")
)

This token can be passed into httr verbs (in fact, it’s a httr::TokenServiceAccount) where it will be used for authentication. httr handles all of the stuff I don’t want to think about, like token refreshing and authentication headers.

Uploading the images

I can query the Vision API with image data directly, but another option is to keep the thumbnails in a Cloud Storage bucket. I created an animal-crossing bucket through the Google Cloud Platform console. I’ll create a function for uploading villager images. I assume villager to be a single row of the villagers data frame, so that I can effectively treat it like a list. This function will:

download villager$url to a temp file and use on.exit to clean up afterwards,
define the name of the object I’m creating, using the villager’s id,
use httr::POST to post the image using my gcp_token, and finally
check that the resulting status code is 200 (success)

upload_villager_image <- function(villager) {
  temp <- tempfile()
  on.exit(unlink(temp))
  download.file(villager$url, temp)
  object_name <- paste0(villager$id, ".png")

  response <- POST(
    glue("https://storage.googleapis.com/upload/storage/v1/b/animal-crossing/o?uploadType=media&name={object_name}"),
    body = upload_file(temp, type = "image/png"),
    config(token = gcp_token)
  )
  if (status_code(response) != 200) {
    stop(glue("Upload of {villager$id} failed with status code {status_code(response)}"))
  }
}

If I can upload a single villager image, I can upload them all. I use purrr to iterate through the rows of the villagers data frame, uploading each of the 391 villager images.

walk(
  1:nrow(villagers),
  function(row_index) {
    villager <- villagers[row_index,]
    upload_villager_image(villager)
  }
)

A quick aside: I don’t often see code that uses purrr to iterate through the rows of a data frame like this, which makes me think I’m doing something unconventional. A better option may be to pull out villager$name and villager$url, and pass those as arguments to a binary upload_villager_image function.

Annotating the villagers

With the images uploaded to Cloud Storage, I can query the Cloud Vision API with the path to a given thumbnail. For example, I can give gs://animal-crossing/tangy.png as an argument to the images:annotate endpoint.

The response is a list of labels, each consisting of a description (the label itself), a confidence score and a topicality score. I’ll flatten this to a one-row data frame (tibble) of confidence scores, with columns the labels. This will make it easier to later concatenate the labels with the villagers data frame.

Note also the potential for the API to return duplicate labels — in this case, I take the maximum score.

annotate <- function(villager_id) {
  json <- jsonlite::toJSON(
      list(
          requests = list(
              image = list(
                  source = list(
                      gcsImageUri = glue::glue("gs://animal-crossing/{villager_id}.png")
                  )
              ),
              features = list(list(
                  maxResults = 50,
                  type = "LABEL_DETECTION"
              ))
          )
      ),
      auto_unbox = TRUE
  )
  
  response <- POST(
      "https://vision.googleapis.com/v1/images:annotate",
      body = json,
      config(token = gcp_token),
      add_headers(`Content-Type` = "application/json; charset=utf-8")
  )
  
  if (status_code(response) != 200) {
      stop("Error labelling ", villager)
  }
  
  content(response)$responses[[1]]$labelAnnotations %>% 
    map(as_tibble) %>% 
    reduce(bind_rows) %>% 
    select(description, score) %>% 
    pivot_wider(names_from = description, values_from = score, values_fn = max) %>% 
    janitor::clean_names()
}

I ask for 50 labels, but the API appears not return labels with a confidence score of less than 0.5, so I may get fewer:

annotate("audie")
#> # A tibble: 1 x 19
#>    head   toy cartoon fashion_design   art sunglasses electric_blue eyewear
#>   <dbl> <dbl>   <dbl>          <dbl> <dbl>      <dbl>         <dbl>   <dbl>
#> 1 0.972 0.921   0.812          0.808 0.801      0.746         0.741   0.699
#> # … with 11 more variables: magenta <dbl>, fictional_character <dbl>,
#> #   goggles <dbl>, doll <dbl>, pattern <dbl>, entertainment <dbl>,
#> #   figurine <dbl>, visual_arts <dbl>, performing_arts <dbl>, child_art <dbl>,
#> #   painting <dbl>

This isn’t very pretty to look at, so I’ll make a nice plot:

plot_villager <- function(villager_id) {
  villager <- villagers %>% filter(id == villager_id)
  if (nrow(villager) == 0) {
    stop("Couldn't find villager with id ", villager_id)
  }
  
  villager_plot <- villager_id %>% 
    annotate() %>% 
    pivot_longer(everything(), names_to = "label", values_to = "score") %>% 
    top_n(8, wt = score) %>% 
    mutate(label = factor(label, levels = rev(.$label))) %>% 
    ggplot(aes(x = label, y = score, fill = label)) +
    geom_bar(stat = "identity") +
    scale_fill_brewer(palette="Set1") +
    theme(
      legend.position = "none",
      axis.title.x = element_blank(),
      axis.title.y = element_blank(),
      axis.text = element_text(size = 20),
      plot.title = element_text(size = 32)
    ) +
    ggtitle(villager$name) +
    coord_flip()

  villager_image <- png::readPNG(
    curl::curl_fetch_memory(villager$url)$content,
    native = TRUE
  )
  
  villager_plot + villager_image
}

plot_villager("audie")

An attempt at machine learning

Readers of my blog should expect this by now, but I tend not to care about model accuracy in these posts. My interest is always in the process of building a model, rather than the model itself. A warning ahead: the model I’m about to train here will perform terribly.

I don’t believe model tuning or trying different techniques would help here. The dataset is very sparse and wide, so there’s not a lot of information to model.

Label all villagers

I’ve defined a function for annotating a single villager, but I have 391 to label. Google Cloud does have a batch annotation API, but I decided to save the coding effort and just re-use my single-villager annotation function with purrr.

The following can take a few minutes. At times progress was stalling, and I suspect I was brushing up against some API limits. The Sys.sleep(0.5) is intended to address that, but I’m only speculating.

labels <- map(villagers$id, function(x) {Sys.sleep(0.5); list(annotate(x))}) %>% 
  reduce(bind_rows) %>% 
  rename_all(~glue("label_{.x}"))

I’ve prefixed every label with “label_” so that I can identify these columns later in data pre-processing. Setting up a sensible column naming convention will let me use the powerful tidyselect::starts_with selector.

labels is a wide data frame with 413 columns. But 94% entries are NA. This is because the Cloud Vision API returns only the labels it deems most relevant. It also seems to not return any labels with a “score” of less than 0.5. The end result of dplyr::bind_rows is a wide, sparse data frame of floats and NAs.

I’ll have to deal with this problem in pre-processing. For now I’ll combine labels with the villagers data frame:

villagers_labelled <- cbind(villagers, labels)
dim(villagers_labelled)
#> [1] 391 425

Pre-processing

I’ll use the recipes package to pre-process the data before modelling. This is one of my favourite packages, and a real star of tidymodels. First I’ll do a simple train/test split, since my pre-processing strategy can’t depend on the test data:

split <- initial_split(villagers_labelled, prop = 0.8)
train <- training(split)
dim(train)
#> [1] 312 425
test <- testing(split)
dim(test)
#> [1]  79 425

To mitigate the impact of the sparsity, I’ll remove any labels that are blank more than half the time in the training data. I’ll make a note of these now:

too_many_missing <- train %>%
  select(starts_with("label")) %>% 
  select_if(~sum(is.na(.x))/length(.x) > 0.5) %>% 
  colnames()

I can’t find documentation to confirm this, but it appears as though the Google Cloud Vision API won’t return a label with a score of less than 0.5. One way to deal with the sparsity of these labels is to binarise them — TRUE if the label is present, otherwise FALSE. This turns the labels into features that effectively say, “Did the Cloud Vision API detect this label?”.

Species is also a difficult predictor here — in the training set there are 35 different species amongst 312 villagers. I’ll collapse the uncommon species into an “other” category.

The remaining pre-processing steps are fairly standard — discarding unneeded columns, converting strings to factors, and applying one-hot encoding. I’ll also keep using log(popularity) here, to deal with those long tails in the popularity scores.

pre_processing <- recipe(train, popularity ~ .) %>%
  step_rm(row_n, id, name, birthday, song, phrase, full_id, url) %>% 
  step_rm(one_of(too_many_missing)) %>% 
  step_mutate_at(starts_with("label"), fn = ~as.integer(!is.na(.x))) %>% 
  step_string2factor(has_type("character")) %>% 
  step_other(species, threshold = 0.03) %>% 
  step_dummy(all_nominal_predictors(), one_hot = TRUE) %>% 
  step_log(popularity, skip = TRUE)

An `xgboost` model

The processed train data has 37 columns, but is of (matrix) rank 34. Informally, this means that the training data is bigger than the information it contains. Linear models will throw warnings here. Tree-based methods will hide the problem, but there’s no escaping the fact that any model trained on this data will be terrible.

I’ll set up an xgboost model with the parsnip package, allowing for tuning the tree_depth and mtry parameters. Here, mtry refers to the number of predictors available to the model at each split. Finally, I’ll combine the pre-processing and the model into a workflow.

xgboost_model <- boost_tree(trees = 200, mtry = tune(), tree_depth = tune()) %>% 
  set_engine("xgboost") %>% 
  set_mode("regression")

xgboost_workflow <- workflow() %>% 
  add_recipe(pre_processing) %>% 
  add_model(xgboost_model)
xgboost_workflow
#> ══ Workflow ════════════════════════════════════════════════════════════════════
#> Preprocessor: Recipe
#> Model: boost_tree()
#> 
#> ── Preprocessor ────────────────────────────────────────────────────────────────
#> 7 Recipe Steps
#> 
#> • step_rm()
#> • step_rm()
#> • step_mutate_at()
#> • step_string2factor()
#> • step_other()
#> • step_dummy()
#> • step_log()
#> 
#> ── Model ───────────────────────────────────────────────────────────────────────
#> Boosted Tree Model Specification (regression)
#> 
#> Main Arguments:
#>   mtry = tune()
#>   trees = 200
#>   tree_depth = tune()
#> 
#> Computational engine: xgboost

I’ll tune the model, relying on the default grid for tree_depth and mtry, and using 5-fold cross-validation:

folds <- vfold_cv(train, v = 5)
tune_results <- tune_grid(xgboost_workflow, resamples = folds)
#> i Creating pre-processing data to finalize unknown parameter: mtry

I’ll use whichever mtry and tree_depth parameters minimise root mean-squared error to finalise my workflow, and fit it to the train data.

fitted_xgboost_workflow <- xgboost_workflow %>% 
  finalize_workflow(select_best(tune_results, metric = "rmse")) %>% 
  fit(train)

It’s time to see just how bad this model is. Recall that I took the log of the popularity in the training data, so to truly evaluate the performance I have to take the exp of the predictions.

test_performance <- test %>% 
  mutate(
    predicted =  predict(fitted_xgboost_workflow, test)$.pred %>% exp(),
    residual = popularity - predicted
  )
metric_set(rmse, mae)(test_performance, popularity, predicted)
#> # A tibble: 2 x 3
#>   .metric .estimator .estimate
#>   <chr>   <chr>          <dbl>
#> 1 rmse    standard       108. 
#> 2 mae     standard        58.0

Oof, that model is pretty bad. I wonder if it’s because the distribution of popularity isn’t uniform? I’ll compare the predicted and actual values to see if there’s a difference at the extreme ends:

test_performance %>% 
  ggplot(aes(x = predicted, y = popularity)) + 
  geom_point() +
  geom_abline(intercept = 0, slope = 1)

Sure enough, that seems to be the case. For values below about 50, the model seems to be not too bad, and certainly better than it performs for the more popular villagers.

Model interpretability

I tried to use some model interpretability techniques to see what effect these labels were having on villager popularity. Unfortunately, I had trouble applying either LIME or SHAP:

The lime package throws many, many warnings. I’m not surprised. The inputs are rank-deficient matrices and the LIME technique uses on linear models.
The shapr package doesn’t support explanations for more than 30 features.

I’ll show the results of my lime analysis here, with the understanding that the results are almost certainly nonsense.

First I’ll separate the pre-processing function and model object from the workflow, since lime (nor shapr) can’t handle the in-built pre-processing of a workflow object:

pre_processing_function <- function(x) {
  pull_workflow_prepped_recipe(fitted_xgboost_workflow) %>% 
    bake(x) %>% 
    select(-popularity)
}

fitted_xgboost_model <- pull_workflow_fit(fitted_xgboost_workflow)

Then I fit the explainer. The quantile binning approach just doesn’t work with such sparse data, so I disable it.

explainer <- lime(
  pre_processing_function(train),
  fitted_xgboost_model,
  quantile_bins = FALSE
)

Now I’ll explain a few test cases and plot the results. I’ll suppress the warnings that would usually appear here.

test_case <- sample_n(test, 10)

explanations <- suppressWarnings(
  explain(
    pre_processing_function(test_case),
    explainer,
    n_features = 6
  )
)

plot_explanations(explanations) + 
  scale_x_discrete(labels = test_case$name)

The Animal Crossing franchise and its fictional characters are the property of Nintendo. The thumbnail images of Animal Crossing villagers on this page are used for the purposes of study and commentary.

devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 4.1.0 (2021-05-18)
#>  os       macOS Big Sur 11.3          
#>  system   aarch64, darwin20           
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_AU.UTF-8                 
#>  ctype    en_AU.UTF-8                 
#>  tz       Australia/Melbourne         
#>  date     2021-06-07                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package      * version    date       lib source                         
#>  askpass        1.1        2019-01-13 [1] CRAN (R 4.1.0)                 
#>  assertthat     0.2.1      2019-03-21 [1] CRAN (R 4.1.0)                 
#>  backports      1.2.1      2020-12-09 [1] CRAN (R 4.1.0)                 
#>  BiocManager    1.30.15    2021-05-11 [1] CRAN (R 4.1.0)                 
#>  broom        * 0.7.6      2021-04-05 [1] CRAN (R 4.1.0)                 
#>  cachem         1.0.4      2021-02-13 [1] CRAN (R 4.1.0)                 
#>  callr          3.7.0      2021-04-20 [1] CRAN (R 4.1.0)                 
#>  cellranger     1.1.0      2016-07-27 [1] CRAN (R 4.1.0)                 
#>  class          7.3-19     2021-05-03 [1] CRAN (R 4.1.0)                 
#>  cli            2.5.0      2021-04-26 [1] CRAN (R 4.1.0)                 
#>  codetools      0.2-18     2020-11-04 [1] CRAN (R 4.1.0)                 
#>  colorspace     2.0-1      2021-05-04 [1] CRAN (R 4.1.0)                 
#>  crayon         1.4.1      2021-02-08 [1] CRAN (R 4.1.0)                 
#>  curl           4.3.1      2021-04-30 [1] CRAN (R 4.1.0)                 
#>  data.table     1.14.0     2021-02-21 [1] CRAN (R 4.1.0)                 
#>  DBI            1.1.1      2021-01-15 [1] CRAN (R 4.1.0)                 
#>  dbplyr         2.1.1      2021-04-06 [1] CRAN (R 4.1.0)                 
#>  desc           1.3.0      2021-03-05 [1] CRAN (R 4.1.0)                 
#>  devtools       2.4.0      2021-04-07 [1] CRAN (R 4.1.0)                 
#>  dials        * 0.0.9      2020-09-16 [1] CRAN (R 4.1.0)                 
#>  DiceDesign     1.9        2021-02-13 [1] CRAN (R 4.1.0)                 
#>  digest         0.6.27     2020-10-24 [1] CRAN (R 4.1.0)                 
#>  downlit        0.2.1      2020-11-04 [1] CRAN (R 4.1.0)                 
#>  dplyr        * 1.0.5      2021-03-05 [1] CRAN (R 4.1.0)                 
#>  ellipsis       0.3.2      2021-04-29 [1] CRAN (R 4.1.0)                 
#>  evaluate       0.14       2019-05-28 [1] CRAN (R 4.1.0)                 
#>  fansi          0.4.2      2021-01-15 [1] CRAN (R 4.1.0)                 
#>  farver         2.1.0      2021-02-28 [1] CRAN (R 4.1.0)                 
#>  fastmap        1.1.0      2021-01-25 [1] CRAN (R 4.1.0)                 
#>  forcats      * 0.5.1      2021-01-27 [1] CRAN (R 4.1.0)                 
#>  foreach        1.5.1      2020-10-15 [1] CRAN (R 4.1.0)                 
#>  fs             1.5.0      2020-07-31 [1] CRAN (R 4.1.0)                 
#>  furrr          0.2.2      2021-01-29 [1] CRAN (R 4.1.0)                 
#>  future         1.21.0     2020-12-10 [1] CRAN (R 4.1.0)                 
#>  gargle         1.1.0      2021-04-02 [1] CRAN (R 4.1.0)                 
#>  generics       0.1.0      2020-10-31 [1] CRAN (R 4.1.0)                 
#>  ggimage      * 0.2.8      2020-04-02 [1] CRAN (R 4.1.0)                 
#>  ggplot2      * 3.3.3      2020-12-30 [1] CRAN (R 4.1.0)                 
#>  ggplotify      0.0.7      2021-05-11 [1] CRAN (R 4.1.0)                 
#>  glmnet         4.1-1      2021-02-21 [1] CRAN (R 4.1.0)                 
#>  globals        0.14.0     2020-11-22 [1] CRAN (R 4.1.0)                 
#>  glue         * 1.4.2      2020-08-27 [1] CRAN (R 4.1.0)                 
#>  gower          0.2.2      2020-06-23 [1] CRAN (R 4.1.0)                 
#>  GPfit          1.0-8      2019-02-08 [1] CRAN (R 4.1.0)                 
#>  gridGraphics   0.5-1      2020-12-13 [1] CRAN (R 4.1.0)                 
#>  gtable         0.3.0      2019-03-25 [1] CRAN (R 4.1.0)                 
#>  hardhat        0.1.5      2020-11-09 [1] CRAN (R 4.1.0)                 
#>  haven          2.4.1      2021-04-23 [1] CRAN (R 4.1.0)                 
#>  highr          0.9        2021-04-16 [1] CRAN (R 4.1.0)                 
#>  hms            1.0.0      2021-01-13 [1] CRAN (R 4.1.0)                 
#>  htmltools      0.5.1.1    2021-01-22 [1] CRAN (R 4.1.0)                 
#>  httr         * 1.4.2      2020-07-20 [1] CRAN (R 4.1.0)                 
#>  hugodown       0.0.0.9000 2021-05-16 [1] Github (r-lib/hugodown@97ea0cd)
#>  infer        * 0.5.4      2021-01-13 [1] CRAN (R 4.1.0)                 
#>  ipred          0.9-11     2021-03-12 [1] CRAN (R 4.1.0)                 
#>  iterators      1.0.13     2020-10-15 [1] CRAN (R 4.1.0)                 
#>  jsonlite       1.7.2      2020-12-09 [1] CRAN (R 4.1.0)                 
#>  knitr          1.33       2021-04-24 [1] CRAN (R 4.1.0)                 
#>  labeling       0.4.2      2020-10-20 [1] CRAN (R 4.1.0)                 
#>  lattice        0.20-44    2021-05-02 [1] CRAN (R 4.1.0)                 
#>  lava           1.6.9      2021-03-11 [1] CRAN (R 4.1.0)                 
#>  lhs            1.1.1      2020-10-05 [1] CRAN (R 4.1.0)                 
#>  lifecycle      1.0.0      2021-02-15 [1] CRAN (R 4.1.0)                 
#>  lime         * 0.5.2      2021-02-24 [1] CRAN (R 4.1.0)                 
#>  listenv        0.8.0      2019-12-05 [1] CRAN (R 4.1.0)                 
#>  lubridate      1.7.10     2021-02-26 [1] CRAN (R 4.1.0)                 
#>  magick         2.7.2      2021-05-02 [1] CRAN (R 4.1.0)                 
#>  magrittr       2.0.1      2020-11-17 [1] CRAN (R 4.1.0)                 
#>  MASS           7.3-54     2021-05-03 [1] CRAN (R 4.1.0)                 
#>  Matrix         1.3-3      2021-05-04 [1] CRAN (R 4.1.0)                 
#>  memoise        2.0.0      2021-01-26 [1] CRAN (R 4.1.0)                 
#>  modeldata    * 0.1.0      2020-10-22 [1] CRAN (R 4.1.0)                 
#>  modelr         0.1.8      2020-05-19 [1] CRAN (R 4.1.0)                 
#>  munsell        0.5.0      2018-06-12 [1] CRAN (R 4.1.0)                 
#>  nnet           7.3-16     2021-05-03 [1] CRAN (R 4.1.0)                 
#>  openssl        1.4.4      2021-04-30 [1] CRAN (R 4.1.0)                 
#>  parallelly     1.25.0     2021-04-30 [1] CRAN (R 4.1.0)                 
#>  parsnip      * 0.1.6      2021-05-27 [1] CRAN (R 4.1.0)                 
#>  patchwork    * 1.1.1      2020-12-17 [1] CRAN (R 4.1.0)                 
#>  pillar         1.6.1      2021-05-16 [1] CRAN (R 4.1.0)                 
#>  pkgbuild       1.2.0      2020-12-15 [1] CRAN (R 4.1.0)                 
#>  pkgconfig      2.0.3      2019-09-22 [1] CRAN (R 4.1.0)                 
#>  pkgload        1.2.1      2021-04-06 [1] CRAN (R 4.1.0)                 
#>  plyr           1.8.6      2020-03-03 [1] CRAN (R 4.1.0)                 
#>  prettyunits    1.1.1      2020-01-24 [1] CRAN (R 4.1.0)                 
#>  pROC           1.17.0.1   2021-01-13 [1] CRAN (R 4.1.0)                 
#>  processx       3.5.2      2021-04-30 [1] CRAN (R 4.1.0)                 
#>  prodlim        2019.11.13 2019-11-17 [1] CRAN (R 4.1.0)                 
#>  ps             1.6.0      2021-02-28 [1] CRAN (R 4.1.0)                 
#>  purrr        * 0.3.4      2020-04-17 [1] CRAN (R 4.1.0)                 
#>  R6             2.5.0      2020-10-28 [1] CRAN (R 4.1.0)                 
#>  Rcpp           1.0.6      2021-01-15 [1] CRAN (R 4.1.0)                 
#>  readr        * 1.4.0      2020-10-05 [1] CRAN (R 4.1.0)                 
#>  readxl         1.3.1      2019-03-13 [1] CRAN (R 4.1.0)                 
#>  recipes      * 0.1.16     2021-04-16 [1] CRAN (R 4.1.0)                 
#>  remotes        2.3.0      2021-04-01 [1] CRAN (R 4.1.0)                 
#>  reprex         2.0.0      2021-04-02 [1] CRAN (R 4.1.0)                 
#>  rlang          0.4.11     2021-04-30 [1] CRAN (R 4.1.0)                 
#>  rmarkdown      2.8        2021-05-07 [1] CRAN (R 4.1.0)                 
#>  rpart          4.1-15     2019-04-12 [1] CRAN (R 4.1.0)                 
#>  rprojroot      2.0.2      2020-11-15 [1] CRAN (R 4.1.0)                 
#>  rsample      * 0.1.0      2021-05-08 [1] CRAN (R 4.1.0)                 
#>  rstudioapi     0.13       2020-11-12 [1] CRAN (R 4.1.0)                 
#>  rvcheck        0.1.8      2020-03-01 [1] CRAN (R 4.1.0)                 
#>  rvest          1.0.0      2021-03-09 [1] CRAN (R 4.1.0)                 
#>  scales       * 1.1.1      2020-05-11 [1] CRAN (R 4.1.0)                 
#>  sessioninfo    1.1.1      2018-11-05 [1] CRAN (R 4.1.0)                 
#>  shape          1.4.6      2021-05-19 [1] CRAN (R 4.1.0)                 
#>  stringi        1.6.1      2021-05-10 [1] CRAN (R 4.1.0)                 
#>  stringr      * 1.4.0      2019-02-10 [1] CRAN (R 4.1.0)                 
#>  survival       3.2-11     2021-04-26 [1] CRAN (R 4.1.0)                 
#>  testthat       3.0.2      2021-02-14 [1] CRAN (R 4.1.0)                 
#>  tibble       * 3.1.2      2021-05-16 [1] CRAN (R 4.1.0)                 
#>  tidymodels   * 0.1.3      2021-04-19 [1] CRAN (R 4.1.0)                 
#>  tidyr        * 1.1.3      2021-03-03 [1] CRAN (R 4.1.0)                 
#>  tidyselect     1.1.1      2021-04-30 [1] CRAN (R 4.1.0)                 
#>  tidyverse    * 1.3.1      2021-04-15 [1] CRAN (R 4.1.0)                 
#>  timeDate       3043.102   2018-02-21 [1] CRAN (R 4.1.0)                 
#>  tune         * 0.1.5      2021-04-23 [1] CRAN (R 4.1.0)                 
#>  usethis        2.0.1      2021-02-10 [1] CRAN (R 4.1.0)                 
#>  utf8           1.2.1      2021-03-12 [1] CRAN (R 4.1.0)                 
#>  vctrs          0.3.8      2021-04-29 [1] CRAN (R 4.1.0)                 
#>  withr          2.4.2      2021-04-18 [1] CRAN (R 4.1.0)                 
#>  workflows    * 0.2.2      2021-03-10 [1] CRAN (R 4.1.0)                 
#>  workflowsets * 0.0.2      2021-04-16 [1] CRAN (R 4.1.0)                 
#>  xfun           0.22       2021-03-11 [1] CRAN (R 4.1.0)                 
#>  xgboost        1.4.1.1    2021-04-22 [1] CRAN (R 4.1.0)                 
#>  xml2           1.3.2      2020-04-23 [1] CRAN (R 4.1.0)                 
#>  yaml           2.2.1      2020-02-01 [1] CRAN (R 4.1.0)                 
#>  yardstick    * 0.0.8      2021-03-28 [1] CRAN (R 4.1.0)                 
#> 
#> [1] /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/library