Determining system dependencies for R projects

Determining system dependencies for R projects

Locking down R package dependencies and versions is a solved problem, thanks to the easy-to-use renv package. System dependencies — those Linux packages that need to be installed to make certain R packages work — are a bit harder to manage.

Option 1: Hard-coding

The easiest option is to hard-code the system dependencies. I did this recently when I was creating a Dockerfile for a very simple Plumber API:

RUN apt-get update -qq && apt-get -y --no-install-recommends install \
    make \
    libsodium-dev \
    libicu-dev \
    libcurl4-openssl-dev \
    libssl-dev

My Dockerfile used only three R packages and so its system dependencies were not complicated. There are two ways of determining which packages to install:

  1. (The bad way) Try to build the Dockerfile and use the errors to determine which dependencies are missing, or
  2. (The good way) Use The RStudio Package Manager.

The RStudio Package Manager (RSPM) has had a huge impact on my R workflow. It just makes life easier. In this case, it tells me the system dependencies for each R package, as well as the installation commands. System dependencies vary between Linux distributions and releases, and RSPM takes this into account.

Hardcoding system dependencies into Dockerfiles or CI/CD pipelines makes sense for small, throwaway projects, but isn’t a great idea for ongoing and dynamic projects. A better option is to automatically determine these dependencies automatically.

Option 2: From package DESCRIPTIONs

There are some great options for determining system dependencies automatically from package DESCRIPTION files. These files contain lists of dependencies for the package, so all that’s needed is an established repository to translate those R dependencies to system dependencies.

RSPM has a public API, and that has a few endpoints for querying system dependencies. The remotes package offers the system_requirements function, which queries the RSPM API for the system dependencies of a package. The package can be on CRAN, or it can be a package under local development; in this case, the DESCRIPTION file is used.

An important use-case for this is in creating continuous integration pipelines for a package in development. Every time the package is updated a fresh environment is used for testing, so system dependencies need to be installed each time. The r-lib actions repository has an example of a standard package check in Github Actions that does this:

- name: Install system dependencies
  if: runner.os == 'Linux'
  run: |
    while read -r cmd
    do
      eval sudo $cmd
    done < <(Rscript -e 'writeLines(remotes::system_requirements("ubuntu", "20.04"))')

An earlier version of this workflow used the sysreqs package, which calls on https://sysreqs.r-hub.io/ to perform this translation:

- name: Install system dependencies
  if: runner.os == 'Linux'
  env:
    RHUB_PLATFORM: linux-x86_64-ubuntu-gcc
  run: |
    Rscript -e "remotes::install_github('r-hub/sysreqs')"
    sysreqs=$(Rscript -e "cat(sysreqs::sysreq_commands('DESCRIPTION'))")
    sudo -s eval "$sysreqs"

Most of the work in this space has been done by Jim Hester, whose contributions to Github Actions for R have made my life much, much easier.

Option 3: From renv lock files

While remotes::system_requirements function is great for package development, it doesn’t cover every R project. The emerging standard for managing R package dependencies is renv. Given this, and given renv’s capacity to automatically detect package dependencies, it makes sense to explore linking renv lock files to system dependencies.

Before I go on, I’ll just say that I would be very surprised if this hasn’t been done before, or someone isn’t already looking at this.

The RSPM API contains an endpoint for querying system dependencies with a list of packages, rather than a DESCRIPTION file. Here’s an example query:

http://packagemanager.rstudio.com/__api__/repos/1/sysreqs?all=false&pkgname=plumber&pkgname=rmarkdown&distribution=ubuntu&release=20.04

Despite what the Swagger page says, the package names need to be specified each with pkgname=, rather than being separated by commas.

The result is a JSON that needs to be parsed into something usable. I’ve created a package that does just this. It’s a very low-effort package, so please don’t use it for anything serious. Or better yet, don’t use it at all. But it does show that the RSPM API supports this use-case:

library(getsysreqs)

get_sysreqs(
  c("plumber", "rmarkdown"),
  distribution = "ubuntu",
  release = "20.04"
)

#> [1] "libsodium-dev"        "libcurl4-openssl-dev" "libssl-dev"          
#> [4] "make"                 "libicu-dev"           "pandoc"

With a little more JSON-parsing, it’s possible to extract the R package dependencies from an renv lock file. Here’s an example from a more complicated project:

get_sysreqs(
  "renv.lock",
  distribution = "ubuntu",
  release = "20.04"
)

#>  [1] "libcurl4-openssl-dev" "libssl-dev"           "libxml2-dev"         
#>  [4] "libgit2-dev"          "libssh2-1-dev"        "zlib1g-dev"          
#>  [7] "make"                 "git"                  "libicu-dev"          
#> [10] "pandoc"               "libglpk-dev"          "libgmp3-dev"

And with only a little bit of string manipulation, it’s possible to generate install commands:

apt_get_install(
  "renv.lock",
  distribution = "ubuntu",
  release = "20.04"
)

#> [1] "apt-get update -qq && apt-get -y --no-install-recommends install libcurl4-openssl-dev libssl-dev libxml2-dev libssh2-1-dev zlib1g-dev make git libicu-dev pandoc libglpk-dev libgmp3-dev"

This isn’t perfect:

  1. Currently this only accepts CRAN dependencies. The RSPM API returns an error when a non-existent package is in the request. An alternative would be to query every package separately and ignore the errors for non-existent packages, but I’m cautious about querying the API too frequently.
  2. Prefixing every dependency with apt-get install is naïve. System dependencies may have commands that need to be run before or after installation (Java, always Java). Fortunately, the RSPM API also tracks these.
  3. Not every Linux distribution uses apt.

devtools::session_info()

#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 4.0.0 (2020-04-24)
#>  os       Ubuntu 20.04.1 LTS          
#>  system   x86_64, linux-gnu           
#>  ui       X11                         
#>  language en_AU:en                    
#>  collate  en_AU.UTF-8                 
#>  ctype    en_AU.UTF-8                 
#>  tz       Australia/Melbourne         
#>  date     2020-10-25                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version    date       lib
#>  assertthat    0.2.1      2019-03-21 [1]
#>  backports     1.1.10     2020-09-15 [1]
#>  callr         3.4.4      2020-09-07 [1]
#>  cli           2.0.2      2020-02-28 [1]
#>  crayon        1.3.4      2017-09-16 [1]
#>  desc          1.2.0      2018-05-01 [1]
#>  devtools      2.3.0      2020-04-10 [1]
#>  digest        0.6.25     2020-02-23 [1]
#>  downlit       0.2.0      2020-10-03 [1]
#>  ellipsis      0.3.1      2020-05-15 [1]
#>  evaluate      0.14       2019-05-28 [1]
#>  fansi         0.4.1      2020-01-08 [1]
#>  fs            1.5.0      2020-07-31 [1]
#>  getsysreqs  * 0.0.0.9000 2020-10-25 [1]
#>  glue          1.4.2      2020-08-27 [1]
#>  htmltools     0.5.0      2020-06-16 [1]
#>  hugodown      0.0.0.9000 2020-10-03 [1]
#>  knitr         1.30       2020-09-22 [1]
#>  lifecycle     0.2.0      2020-03-06 [1]
#>  magrittr      1.5        2014-11-22 [1]
#>  memoise       1.1.0.9000 2020-05-09 [1]
#>  pkgbuild      1.1.0      2020-07-13 [1]
#>  pkgload       1.1.0      2020-05-29 [1]
#>  prettyunits   1.1.1      2020-01-24 [1]
#>  processx      3.4.4      2020-09-03 [1]
#>  ps            1.3.4      2020-08-11 [1]
#>  purrr         0.3.4      2020-04-17 [1]
#>  R6            2.4.1      2019-11-12 [1]
#>  remotes       2.1.1      2020-02-15 [1]
#>  rlang         0.4.7      2020-07-09 [1]
#>  rmarkdown     2.4.1      2020-10-03 [1]
#>  rprojroot     1.3-2      2018-01-03 [1]
#>  sessioninfo   1.1.1      2018-11-05 [1]
#>  stringi       1.5.3      2020-09-09 [1]
#>  stringr       1.4.0      2019-02-10 [1]
#>  testthat      2.3.2      2020-03-02 [1]
#>  usethis       1.9.0.9000 2020-10-10 [1]
#>  vctrs         0.3.4      2020-08-29 [1]
#>  withr         2.3.0      2020-09-22 [1]
#>  xfun          0.18       2020-09-29 [1]
#>  yaml          2.2.1      2020-02-01 [1]
#>  source                                  
#>  CRAN (R 4.0.0)                          
#>  CRAN (R 4.0.0)                          
#>  CRAN (R 4.0.0)                          
#>  CRAN (R 4.0.0)                          
#>  CRAN (R 4.0.0)                          
#>  CRAN (R 4.0.0)                          
#>  CRAN (R 4.0.0)                          
#>  CRAN (R 4.0.0)                          
#>  Github (r-lib/downlit@df73cf3)          
#>  CRAN (R 4.0.0)                          
#>  CRAN (R 4.0.0)                          
#>  CRAN (R 4.0.0)                          
#>  CRAN (R 4.0.0)                          
#>  Github (mdneuzerling/getsysreqs@197b5f1)
#>  CRAN (R 4.0.0)                          
#>  CRAN (R 4.0.0)                          
#>  Github (r-lib/hugodown@fa43e45)         
#>  CRAN (R 4.0.0)                          
#>  CRAN (R 4.0.0)                          
#>  CRAN (R 4.0.0)                          
#>  Github (hadley/memoise@4aefd9f)         
#>  CRAN (R 4.0.0)                          
#>  CRAN (R 4.0.0)                          
#>  CRAN (R 4.0.0)                          
#>  CRAN (R 4.0.0)                          
#>  CRAN (R 4.0.0)                          
#>  CRAN (R 4.0.0)                          
#>  CRAN (R 4.0.0)                          
#>  CRAN (R 4.0.0)                          
#>  CRAN (R 4.0.0)                          
#>  Github (rstudio/rmarkdown@29aad5e)      
#>  CRAN (R 4.0.0)                          
#>  CRAN (R 4.0.0)                          
#>  CRAN (R 4.0.0)                          
#>  CRAN (R 4.0.0)                          
#>  CRAN (R 4.0.0)                          
#>  Github (r-lib/usethis@195ef14)          
#>  CRAN (R 4.0.0)                          
#>  CRAN (R 4.0.0)                          
#>  CRAN (R 4.0.0)                          
#>  CRAN (R 4.0.0)                          
#> 
#> [1] /home/mdneuzerling/R/x86_64-pc-linux-gnu-library/4.0
#> [2] /usr/local/lib/R/site-library
#> [3] /usr/lib/R/site-library
#> [4] /usr/lib/R/library

The image at the top of this page is in the public domain, and was downloaded from Pexels.