Bootstrapping R functions
Suppose I want a function that runs some setup code before it runs the first time. Maybe I’m using dplyr but I haven’t properly declared all of my dplyr calls in my function, so I want to run library(dplyr)
before the actual function is run. Or maybe I want to install a package if it isn’t already installed, or restore a renv
file, or any other setup process. I only want this special code to run the first time my function is called. After that, the function that runs should be exactly as I declared it, with none of the setup code.
Here’s what I do:
- Create a new function with the same signature as my target function.
- Capture my setup code, and evaluate it when my new bootstrapping function is called.
- When my bootstrapping function is being executed, make it redefine itself with my target function in the same environment.
- After the redefinition, call the function again, which is now the redefined function, which is now my target function.
And then, to make things even more bizarre, I wrap this process up in a function-generating function that does all of this for me, with an input of just a function and some setup code.
By the way, I’m not actually suggesting you do this. It’s a wild idea. Functions redefining themselves is an uncomfortable concept. And the idea that a function is running complicated setup code that isn’t even hinted at in the function name makes me uneasy as well. But you can do this. So here it is:
bootstrapping_function <- function(fn, setup) {
setup <- substitute(setup)
bootstrapping_function <- fn # Copy the function so we can keep its formals
body(bootstrapping_function) <- substitute({
# The name of the function that's currently being executed.
this_function_name <- as.character(match.call()[[1]])
# We want to redefine the function in the same environment in which it's
# currently defined. This function crawls up the environment hierarchy
# until it finds an object with the right name. Possible improvement:
# ignore any objects with the right name if they aren't functions.
which_environment <- function(name, env = parent.frame()) {
# Adapted from http://adv-r.had.co.nz/Environments.html
if (identical(env, emptyenv())) {
stop("Can't find ", name, call. = FALSE)
} else if (exists(name, envir = env, inherits = FALSE)) {
env
} else {
which_environment(name, parent.env(env))
}
}
this_function_env <- which_environment(this_function_name)
# Recover the arguments that are being provided to this function at
# run-time, as a list. This lets us execute the function again after it's
# been redefined.
get_args <- function() {
# Adapted from https://stackoverflow.com/a/47955845/8456369
parent_formals <- formals(sys.function(sys.parent(n = 1)))
fnames <- names(parent_formals)
without_ellipses <- fnames[fnames != "..."]
args <- evalq(as.list(environment()), envir = parent.frame())
if ("..." %in% fnames) {
c(args[without_ellipses], evalq(list(...), envir = parent.frame()))
} else {
args[without_ellipses]
}
}
fn_location <- which_environment(this_function_name)
eval(setup, parent.frame(2)) # evaluate in caller_env
assign(this_function_name, fn, this_function_env) # here's the redefinition
do.call( # call the function again with the same arguments
this_function_name,
args = get_args(),
envir = parent.frame(2)
)
})
bootstrapping_function
}
I haven’t thrown a lot of test cases at this code yet, but here’s a simple example: take a data frame and add 1 to every numeric column. I’ve written the code with dplyr, but I’ve used mutate_if
instead of dplyr::mutate_if
. I’ll need to call library(dplyr)
before I run this function. I’ll put an extra message()
in the setup code to make it clear that I’m actually running the setup.
add_1_to_all_numeric_columns <- bootstrapping_function(
function(df) mutate_if(df, is.numeric, ~.x + 1),
setup = {
message("Setting up the function to add 1 to all numeric columns")
library(dplyr)
}
)
Let’s run this monstrousity:
head(add_1_to_all_numeric_columns(mtcars))
#> Setting up the function to add 1 to all numeric columns
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 22.0 7 161 111 4.90 3.620 17.46 1 2 5 5
#> 2 22.0 7 161 111 4.90 3.875 18.02 1 2 5 5
#> 3 23.8 5 109 94 4.85 3.320 19.61 2 2 5 2
#> 4 22.4 7 259 111 4.08 4.215 20.44 2 1 4 2
#> 5 19.7 9 361 176 4.15 4.440 18.02 1 1 4 3
#> 6 19.1 7 226 106 3.76 4.460 21.22 2 1 4 2
Sure enough, the function has been redefined:
add_1_to_all_numeric_columns
#> function(df) mutate_if(df, is.numeric, ~.x + 1)
#> <bytecode: 0x55e45ccb8c28>
#> <environment: 0x55e45cc770b0>
And now, if I run it a second time, there’s no setup:
head(add_1_to_all_numeric_columns(mtcars))
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 22.0 7 161 111 4.90 3.620 17.46 1 2 5 5
#> 2 22.0 7 161 111 4.90 3.875 18.02 1 2 5 5
#> 3 23.8 5 109 94 4.85 3.320 19.61 2 2 5 2
#> 4 22.4 7 259 111 4.08 4.215 20.44 2 1 4 2
#> 5 19.7 9 361 176 4.15 4.440 18.02 1 1 4 3
#> 6 19.1 7 226 106 3.76 4.460 21.22 2 1 4 2
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.0.0 (2020-04-24)
#> os Ubuntu 20.04 LTS
#> system x86_64, linux-gnu
#> ui X11
#> language en_AU:en
#> collate en_AU.UTF-8
#> ctype en_AU.UTF-8
#> tz Australia/Melbourne
#> date 2020-07-07
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0)
#> backports 1.1.8 2020-06-17 [1] CRAN (R 4.0.0)
#> callr 3.4.3 2020-03-28 [1] CRAN (R 4.0.0)
#> cli 2.0.2 2020-02-28 [1] CRAN (R 4.0.0)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.0)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 4.0.0)
#> devtools 2.3.0 2020-04-10 [1] CRAN (R 4.0.0)
#> digest 0.6.25 2020-02-23 [1] CRAN (R 4.0.0)
#> downlit 0.0.0.9000 2020-06-15 [1] Github (r-lib/downlit@9191e1f)
#> dplyr * 1.0.0 2020-05-29 [1] CRAN (R 4.0.0)
#> ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.0)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0)
#> fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.0)
#> fs 1.4.1 2020-04-04 [1] CRAN (R 4.0.0)
#> generics 0.0.2 2018-11-29 [1] CRAN (R 4.0.0)
#> glue 1.4.1 2020-05-13 [1] CRAN (R 4.0.0)
#> htmltools 0.5.0 2020-06-16 [1] CRAN (R 4.0.0)
#> hugodown 0.0.0.9000 2020-06-20 [1] Github (r-lib/hugodown@f7df565)
#> knitr 1.28 2020-02-06 [1] CRAN (R 4.0.0)
#> lifecycle 0.2.0 2020-03-06 [1] CRAN (R 4.0.0)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 4.0.0)
#> memoise 1.1.0.9000 2020-05-09 [1] Github (hadley/memoise@4aefd9f)
#> pillar 1.4.4 2020-05-05 [1] CRAN (R 4.0.0)
#> pkgbuild 1.0.7 2020-04-25 [1] CRAN (R 4.0.0)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.0)
#> pkgload 1.0.2 2018-10-29 [1] CRAN (R 4.0.0)
#> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.0)
#> processx 3.4.2 2020-02-09 [1] CRAN (R 4.0.0)
#> ps 1.3.3 2020-05-08 [1] CRAN (R 4.0.0)
#> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.0)
#> R6 2.4.1 2019-11-12 [1] CRAN (R 4.0.0)
#> remotes 2.1.1 2020-02-15 [1] CRAN (R 4.0.0)
#> rlang 0.4.6 2020-05-02 [1] CRAN (R 4.0.0)
#> rmarkdown 2.3.1 2020-06-20 [1] Github (rstudio/rmarkdown@b53a85a)
#> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 4.0.0)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.0)
#> stringi 1.4.6 2020-02-17 [1] CRAN (R 4.0.0)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.0)
#> testthat 2.3.2 2020-03-02 [1] CRAN (R 4.0.0)
#> tibble 3.0.1 2020-04-20 [1] CRAN (R 4.0.0)
#> tidyselect 1.1.0 2020-05-11 [1] CRAN (R 4.0.0)
#> usethis 1.6.1 2020-04-29 [1] CRAN (R 4.0.0)
#> vctrs 0.3.1 2020-06-05 [1] CRAN (R 4.0.0)
#> withr 2.2.0 2020-04-20 [1] CRAN (R 4.0.0)
#> xfun 0.14 2020-05-20 [1] CRAN (R 4.0.0)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0)
#>
#> [1] /home/mdneuzerling/R/x86_64-pc-linux-gnu-library/4.0
#> [2] /usr/local/lib/R/site-library
#> [3] /usr/lib/R/site-library
#> [4] /usr/lib/R/library