Determining system dependencies for R projects
Locking down R package dependencies and versions is a solved problem, thanks to the easy-to-use renv
package. System dependencies — those Linux packages that need to be installed to make certain R packages work — are a bit harder to manage.
Option 1: Hard-coding
The easiest option is to hard-code the system dependencies. I did this recently when I was creating a Dockerfile for a very simple Plumber API:
RUN apt-get update -qq && apt-get -y --no-install-recommends install \
make \
libsodium-dev \
libicu-dev \
libcurl4-openssl-dev \
libssl-dev
My Dockerfile used only three R packages and so its system dependencies were not complicated. There are two ways of determining which packages to install:
- (The bad way) Try to
build
the Dockerfile and use the errors to determine which dependencies are missing, or - (The good way) Use The RStudio Package Manager.
The RStudio Package Manager (RSPM) has had a huge impact on my R workflow. It just makes life easier. In this case, it tells me the system dependencies for each R package, as well as the installation commands. System dependencies vary between Linux distributions and releases, and RSPM takes this into account.
Hardcoding system dependencies into Dockerfiles or CI/CD pipelines makes sense for small, throwaway projects, but isn’t a great idea for ongoing and dynamic projects. A better option is to automatically determine these dependencies automatically.
Option 2: From package DESCRIPTIONs
There are some great options for determining system dependencies automatically from package DESCRIPTION files. These files contain lists of dependencies for the package, so all that’s needed is an established repository to translate those R dependencies to system dependencies.
RSPM has a public API, and that has a few endpoints for querying system dependencies. The remotes
package offers the system_requirements
function, which queries the RSPM API for the system dependencies of a package. The package can be on CRAN, or it can be a package under local development; in this case, the DESCRIPTION file is used.
An important use-case for this is in creating continuous integration pipelines for a package in development. Every time the package is updated a fresh environment is used for testing, so system dependencies need to be installed each time. The r-lib actions repository has an example of a standard package check in Github Actions that does this:
- name: Install system dependencies
if: runner.os == 'Linux'
run: |
while read -r cmd
do
eval sudo $cmd
done < <(Rscript -e 'writeLines(remotes::system_requirements("ubuntu", "20.04"))')
An earlier version of this workflow used the sysreqs
package, which calls on https://sysreqs.r-hub.io/ to perform this translation:
- name: Install system dependencies
if: runner.os == 'Linux'
env:
RHUB_PLATFORM: linux-x86_64-ubuntu-gcc
run: |
Rscript -e "remotes::install_github('r-hub/sysreqs')"
sysreqs=$(Rscript -e "cat(sysreqs::sysreq_commands('DESCRIPTION'))")
sudo -s eval "$sysreqs"
Most of the work in this space has been done by Jim Hester, whose contributions to Github Actions for R have made my life much, much easier.
Option 3: From renv
lock files
While remotes::system_requirements
function is great for package development, it doesn’t cover every R project. The emerging standard for managing R package dependencies is renv
. Given this, and given renv
’s capacity to automatically detect package dependencies, it makes sense to explore linking renv
lock files to system dependencies.
Before I go on, I’ll just say that I would be very surprised if this hasn’t been done before, or someone isn’t already looking at this.
The RSPM API contains an endpoint for querying system dependencies with a list of packages, rather than a DESCRIPTION file. Here’s an example query:
http://packagemanager.rstudio.com/__api__/repos/1/sysreqs?all=false&pkgname=plumber&pkgname=rmarkdown&distribution=ubuntu&release=20.04
Despite what the Swagger page says, the package names need to be specified each with pkgname=
, rather than being separated by commas.
The result is a JSON that needs to be parsed into something usable. I’ve created a package that does just this. It’s a very low-effort package, so please don’t use it for anything serious. Or better yet, don’t use it at all. But it does show that the RSPM API supports this use-case:
library(getsysreqs)
get_sysreqs(
c("plumber", "rmarkdown"),
distribution = "ubuntu",
release = "20.04"
)
#> [1] "libsodium-dev" "libcurl4-openssl-dev" "libssl-dev"
#> [4] "make" "libicu-dev" "pandoc"
With a little more JSON-parsing, it’s possible to extract the R package dependencies from an renv
lock file. Here’s an example from a more complicated project:
get_sysreqs(
"renv.lock",
distribution = "ubuntu",
release = "20.04"
)
#> [1] "libcurl4-openssl-dev" "libssl-dev" "libxml2-dev"
#> [4] "libgit2-dev" "libssh2-1-dev" "zlib1g-dev"
#> [7] "make" "git" "libicu-dev"
#> [10] "pandoc" "libglpk-dev" "libgmp3-dev"
And with only a little bit of string manipulation, it’s possible to generate install commands:
apt_get_install(
"renv.lock",
distribution = "ubuntu",
release = "20.04"
)
#> [1] "apt-get update -qq && apt-get -y --no-install-recommends install libcurl4-openssl-dev libssl-dev libxml2-dev libssh2-1-dev zlib1g-dev make git libicu-dev pandoc libglpk-dev libgmp3-dev"
This isn’t perfect:
- Currently this only accepts CRAN dependencies. The RSPM API returns an error when a non-existent package is in the request. An alternative would be to query every package separately and ignore the errors for non-existent packages, but I’m cautious about querying the API too frequently.
- Prefixing every dependency with
apt-get install
is naïve. System dependencies may have commands that need to be run before or after installation (Java, always Java). Fortunately, the RSPM API also tracks these. - Not every Linux distribution uses
apt
.
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.0.0 (2020-04-24)
#> os Ubuntu 20.04.1 LTS
#> system x86_64, linux-gnu
#> ui X11
#> language en_AU:en
#> collate en_AU.UTF-8
#> ctype en_AU.UTF-8
#> tz Australia/Melbourne
#> date 2020-10-25
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date lib
#> assertthat 0.2.1 2019-03-21 [1]
#> backports 1.1.10 2020-09-15 [1]
#> callr 3.4.4 2020-09-07 [1]
#> cli 2.0.2 2020-02-28 [1]
#> crayon 1.3.4 2017-09-16 [1]
#> desc 1.2.0 2018-05-01 [1]
#> devtools 2.3.0 2020-04-10 [1]
#> digest 0.6.25 2020-02-23 [1]
#> downlit 0.2.0 2020-10-03 [1]
#> ellipsis 0.3.1 2020-05-15 [1]
#> evaluate 0.14 2019-05-28 [1]
#> fansi 0.4.1 2020-01-08 [1]
#> fs 1.5.0 2020-07-31 [1]
#> getsysreqs * 0.0.0.9000 2020-10-25 [1]
#> glue 1.4.2 2020-08-27 [1]
#> htmltools 0.5.0 2020-06-16 [1]
#> hugodown 0.0.0.9000 2020-10-03 [1]
#> knitr 1.30 2020-09-22 [1]
#> lifecycle 0.2.0 2020-03-06 [1]
#> magrittr 1.5 2014-11-22 [1]
#> memoise 1.1.0.9000 2020-05-09 [1]
#> pkgbuild 1.1.0 2020-07-13 [1]
#> pkgload 1.1.0 2020-05-29 [1]
#> prettyunits 1.1.1 2020-01-24 [1]
#> processx 3.4.4 2020-09-03 [1]
#> ps 1.3.4 2020-08-11 [1]
#> purrr 0.3.4 2020-04-17 [1]
#> R6 2.4.1 2019-11-12 [1]
#> remotes 2.1.1 2020-02-15 [1]
#> rlang 0.4.7 2020-07-09 [1]
#> rmarkdown 2.4.1 2020-10-03 [1]
#> rprojroot 1.3-2 2018-01-03 [1]
#> sessioninfo 1.1.1 2018-11-05 [1]
#> stringi 1.5.3 2020-09-09 [1]
#> stringr 1.4.0 2019-02-10 [1]
#> testthat 2.3.2 2020-03-02 [1]
#> usethis 1.9.0.9000 2020-10-10 [1]
#> vctrs 0.3.4 2020-08-29 [1]
#> withr 2.3.0 2020-09-22 [1]
#> xfun 0.18 2020-09-29 [1]
#> yaml 2.2.1 2020-02-01 [1]
#> source
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> Github (r-lib/downlit@df73cf3)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> Github (mdneuzerling/getsysreqs@197b5f1)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> Github (r-lib/hugodown@fa43e45)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> Github (hadley/memoise@4aefd9f)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> Github (rstudio/rmarkdown@29aad5e)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> Github (r-lib/usethis@195ef14)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#> CRAN (R 4.0.0)
#>
#> [1] /home/mdneuzerling/R/x86_64-pc-linux-gnu-library/4.0
#> [2] /usr/local/lib/R/site-library
#> [3] /usr/lib/R/site-library
#> [4] /usr/lib/R/library
The image at the top of this page is in the public domain, and was downloaded from Pexels.