Bootstrapping R functions

Bootstrapping R functions

Suppose I want a function that runs some setup code before it runs the first time. Maybe I’m using dplyr but I haven’t properly declared all of my dplyr calls in my function, so I want to run library(dplyr) before the actual function is run. Or maybe I want to install a package if it isn’t already installed, or restore a renv file, or any other setup process. I only want this special code to run the first time my function is called. After that, the function that runs should be exactly as I declared it, with none of the setup code.

Here’s what I do:

  1. Create a new function with the same signature as my target function.
  2. Capture my setup code, and evaluate it when my new bootstrapping function is called.
  3. When my bootstrapping function is being executed, make it redefine itself with my target function in the same environment.
  4. After the redefinition, call the function again, which is now the redefined function, which is now my target function.

And then, to make things even more bizarre, I wrap this process up in a function-generating function that does all of this for me, with an input of just a function and some setup code.

By the way, I’m not actually suggesting you do this. It’s a wild idea. Functions redefining themselves is an uncomfortable concept. And the idea that a function is running complicated setup code that isn’t even hinted at in the function name makes me uneasy as well. But you can do this. So here it is:

bootstrapping_function <- function(fn, setup) {
  setup <- substitute(setup)
  bootstrapping_function <- fn # Copy the function so we can keep its formals
  body(bootstrapping_function) <- substitute({
    # The name of the function that's currently being executed.
    this_function_name <- as.character(match.call()[[1]])

    # We want to redefine the function in the same environment in which it's
    # currently defined. This function crawls up the environment hierarchy
    # until it finds an object with the right name. Possible improvement:
    # ignore any objects with the right name if they aren't functions.
    which_environment <- function(name, env = parent.frame()) {
      # Adapted from http://adv-r.had.co.nz/Environments.html
      if (identical(env, emptyenv())) {
        stop("Can't find ", name, call. = FALSE)
      } else if (exists(name, envir = env, inherits = FALSE)) {
        env
      } else {
        which_environment(name, parent.env(env))
      }
    }
    this_function_env <- which_environment(this_function_name)

    # Recover the arguments that are being provided to this function at
    # run-time, as a list. This lets us execute the function again after it's
    # been redefined.
    get_args <- function() {
      # Adapted from https://stackoverflow.com/a/47955845/8456369
      parent_formals <- formals(sys.function(sys.parent(n = 1)))
      fnames <- names(parent_formals)
      without_ellipses <- fnames[fnames != "..."]
      args <- evalq(as.list(environment()), envir = parent.frame())
      if ("..." %in% fnames) {
        c(args[without_ellipses], evalq(list(...), envir = parent.frame()))
      } else {
        args[without_ellipses]
      }
    }

    fn_location <- which_environment(this_function_name)
    eval(setup, parent.frame(2)) # evaluate in caller_env
    assign(this_function_name, fn, this_function_env) # here's the redefinition
    do.call( # call the function again with the same arguments
      this_function_name,
      args = get_args(),
      envir = parent.frame(2)
    )
  })
  bootstrapping_function
}

I haven’t thrown a lot of test cases at this code yet, but here’s a simple example: take a data frame and add 1 to every numeric column. I’ve written the code with dplyr, but I’ve used mutate_if instead of dplyr::mutate_if. I’ll need to call library(dplyr) before I run this function. I’ll put an extra message() in the setup code to make it clear that I’m actually running the setup.

add_1_to_all_numeric_columns <- bootstrapping_function(
  function(df) mutate_if(df, is.numeric, ~.x + 1),
  setup = {
    message("Setting up the function to add 1 to all numeric columns")
    library(dplyr)
  }
)

Let’s run this monstrousity:

head(add_1_to_all_numeric_columns(mtcars))
#> Setting up the function to add 1 to all numeric columns
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
#>    mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> 1 22.0   7  161 111 4.90 3.620 17.46  1  2    5    5
#> 2 22.0   7  161 111 4.90 3.875 18.02  1  2    5    5
#> 3 23.8   5  109  94 4.85 3.320 19.61  2  2    5    2
#> 4 22.4   7  259 111 4.08 4.215 20.44  2  1    4    2
#> 5 19.7   9  361 176 4.15 4.440 18.02  1  1    4    3
#> 6 19.1   7  226 106 3.76 4.460 21.22  2  1    4    2

Sure enough, the function has been redefined:

add_1_to_all_numeric_columns
#> function(df) mutate_if(df, is.numeric, ~.x + 1)
#> <bytecode: 0x55e45ccb8c28>
#> <environment: 0x55e45cc770b0>

And now, if I run it a second time, there’s no setup:

head(add_1_to_all_numeric_columns(mtcars))
#>    mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> 1 22.0   7  161 111 4.90 3.620 17.46  1  2    5    5
#> 2 22.0   7  161 111 4.90 3.875 18.02  1  2    5    5
#> 3 23.8   5  109  94 4.85 3.320 19.61  2  2    5    2
#> 4 22.4   7  259 111 4.08 4.215 20.44  2  1    4    2
#> 5 19.7   9  361 176 4.15 4.440 18.02  1  1    4    3
#> 6 19.1   7  226 106 3.76 4.460 21.22  2  1    4    2

devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 4.0.0 (2020-04-24)
#>  os       Ubuntu 20.04 LTS            
#>  system   x86_64, linux-gnu           
#>  ui       X11                         
#>  language en_AU:en                    
#>  collate  en_AU.UTF-8                 
#>  ctype    en_AU.UTF-8                 
#>  tz       Australia/Melbourne         
#>  date     2020-07-07                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version    date       lib source                            
#>  assertthat    0.2.1      2019-03-21 [1] CRAN (R 4.0.0)                    
#>  backports     1.1.8      2020-06-17 [1] CRAN (R 4.0.0)                    
#>  callr         3.4.3      2020-03-28 [1] CRAN (R 4.0.0)                    
#>  cli           2.0.2      2020-02-28 [1] CRAN (R 4.0.0)                    
#>  crayon        1.3.4      2017-09-16 [1] CRAN (R 4.0.0)                    
#>  desc          1.2.0      2018-05-01 [1] CRAN (R 4.0.0)                    
#>  devtools      2.3.0      2020-04-10 [1] CRAN (R 4.0.0)                    
#>  digest        0.6.25     2020-02-23 [1] CRAN (R 4.0.0)                    
#>  downlit       0.0.0.9000 2020-06-15 [1] Github (r-lib/downlit@9191e1f)    
#>  dplyr       * 1.0.0      2020-05-29 [1] CRAN (R 4.0.0)                    
#>  ellipsis      0.3.1      2020-05-15 [1] CRAN (R 4.0.0)                    
#>  evaluate      0.14       2019-05-28 [1] CRAN (R 4.0.0)                    
#>  fansi         0.4.1      2020-01-08 [1] CRAN (R 4.0.0)                    
#>  fs            1.4.1      2020-04-04 [1] CRAN (R 4.0.0)                    
#>  generics      0.0.2      2018-11-29 [1] CRAN (R 4.0.0)                    
#>  glue          1.4.1      2020-05-13 [1] CRAN (R 4.0.0)                    
#>  htmltools     0.5.0      2020-06-16 [1] CRAN (R 4.0.0)                    
#>  hugodown      0.0.0.9000 2020-06-20 [1] Github (r-lib/hugodown@f7df565)   
#>  knitr         1.28       2020-02-06 [1] CRAN (R 4.0.0)                    
#>  lifecycle     0.2.0      2020-03-06 [1] CRAN (R 4.0.0)                    
#>  magrittr      1.5        2014-11-22 [1] CRAN (R 4.0.0)                    
#>  memoise       1.1.0.9000 2020-05-09 [1] Github (hadley/memoise@4aefd9f)   
#>  pillar        1.4.4      2020-05-05 [1] CRAN (R 4.0.0)                    
#>  pkgbuild      1.0.7      2020-04-25 [1] CRAN (R 4.0.0)                    
#>  pkgconfig     2.0.3      2019-09-22 [1] CRAN (R 4.0.0)                    
#>  pkgload       1.0.2      2018-10-29 [1] CRAN (R 4.0.0)                    
#>  prettyunits   1.1.1      2020-01-24 [1] CRAN (R 4.0.0)                    
#>  processx      3.4.2      2020-02-09 [1] CRAN (R 4.0.0)                    
#>  ps            1.3.3      2020-05-08 [1] CRAN (R 4.0.0)                    
#>  purrr         0.3.4      2020-04-17 [1] CRAN (R 4.0.0)                    
#>  R6            2.4.1      2019-11-12 [1] CRAN (R 4.0.0)                    
#>  remotes       2.1.1      2020-02-15 [1] CRAN (R 4.0.0)                    
#>  rlang         0.4.6      2020-05-02 [1] CRAN (R 4.0.0)                    
#>  rmarkdown     2.3.1      2020-06-20 [1] Github (rstudio/rmarkdown@b53a85a)
#>  rprojroot     1.3-2      2018-01-03 [1] CRAN (R 4.0.0)                    
#>  sessioninfo   1.1.1      2018-11-05 [1] CRAN (R 4.0.0)                    
#>  stringi       1.4.6      2020-02-17 [1] CRAN (R 4.0.0)                    
#>  stringr       1.4.0      2019-02-10 [1] CRAN (R 4.0.0)                    
#>  testthat      2.3.2      2020-03-02 [1] CRAN (R 4.0.0)                    
#>  tibble        3.0.1      2020-04-20 [1] CRAN (R 4.0.0)                    
#>  tidyselect    1.1.0      2020-05-11 [1] CRAN (R 4.0.0)                    
#>  usethis       1.6.1      2020-04-29 [1] CRAN (R 4.0.0)                    
#>  vctrs         0.3.1      2020-06-05 [1] CRAN (R 4.0.0)                    
#>  withr         2.2.0      2020-04-20 [1] CRAN (R 4.0.0)                    
#>  xfun          0.14       2020-05-20 [1] CRAN (R 4.0.0)                    
#>  yaml          2.2.1      2020-02-01 [1] CRAN (R 4.0.0)                    
#> 
#> [1] /home/mdneuzerling/R/x86_64-pc-linux-gnu-library/4.0
#> [2] /usr/local/lib/R/site-library
#> [3] /usr/lib/R/site-library
#> [4] /usr/lib/R/library

The image at the top of this page is in the public domain.