Some Dockerfiles for Building R Package Binaries

Some Dockerfiles for Building R Package Binaries

I went down a strange path recently, trying to compile binaries of R packages for Linux. I’m not sure why — this area is pretty much covered by the RStudio Package Manager. I’ll leave my Dockerfiles here in case they’re of any use to a future wayward R programmer.

The intention here is to build a Docker image that can build an R binary with the below command. I’m trying to build x86 binaries on my ARM Macbook, so I’m specifying the platform during both build and run.

docker run --platform linux/amd64 -v ~/packages:/packages $IMAGE $PACKAGE $VERSION

This will output the compiled binary into a subdirectory ~/packages corresponding to the target version of R. These binaries are not portable — they depend very much on the Linux distribution used to build them.

Method 1: conda-build

conda is a package manager mostly associated with Python, but it can also be used for R and other languages.

The Dockerfile below installs Miniconda and conda-build, which it uses to build the R package binaries. These are binaries that must be installed with conda, rather than through R directly.

I use mamba and boa, which provide faster alternatives to conda install and conda build, respectively.

Every time conda/mamba builds an R package, it fetches all dependencies from scratch. To speed this up, I install R in the docker build process so that it’s cached. Finally I hardcode the script that’s used to build the R package, depending on whether a version is specified.

ARG OS_IDENTIFIER=ubuntu
ARG OS_TAG=20.04
ARG PLATFORM=linux/amd64

FROM --platform=${PLATFORM} ${OS_IDENTIFIER}:${OS_TAG} 

ENV LANG en_US.UTF-8

RUN apt-get update && apt-get install -y curl

# Install Miniconda and conda-build, which is needed to compile R packages
# for conda-forge 
ARG MINICONDA_VERSION=py38_4.9.2
ARG MINICONDA_INSTALLER=Miniconda3-${MINICONDA_VERSION}-Linux-x86_64.sh
RUN curl -LO https://repo.anaconda.com/miniconda/${MINICONDA_INSTALLER} \
    && bash ${MINICONDA_INSTALLER} -p /miniconda -b \
    && rm ${MINICONDA_INSTALLER}
ENV PATH=/miniconda/bin:${PATH}
RUN conda install conda-build

# Mamba is much faster for installing packages, and boa lets us use it
# when building packages
RUN conda install -c conda-forge mamba boa

# conda-build (and its mamba equivalent) will always reach out to a repository
# to install dependencies, rather than using pre-installed packages. However,
# by installing r-base now we can cache the required packages, so that R
# doesn't have to be downloaded each time a package is built.
ENV R_VERSION=4.0.3
RUN mamba install -c conda-forge r-base=${R_VERSION}

# Compiled packages are outputted to this directory. When this container is run,
# /packages can be used as a target for -v 
RUN mkdir -p /packages/R-{$R_VERSION}

RUN echo "#!/bin/bash" > build_r_package.sh \
  && echo ' \n\
package=$1 \n\
version=$2 \n\
if [[ -n "$2" ]]; then \n\
    echo "Building r-$package-$version" \n\
    conda skeleton cran --version $version $package \n\
    conda mambabuild --R ${R_VERSION} -c conda-forge --output-folder /packages/R-${R_VERSION} r-$package-$version \n\
else \n\
    echo "Building r-$package" \n\
    conda skeleton cran $package \n\
    conda mambabuild --R ${R_VERSION} -c conda-forge --output-folder /packages/R-${R_VERSION} r-$package \n\
fi ' >> build_r_package.sh \
  && chmod +x build_r_package.sh

ENTRYPOINT ["/build_r_package.sh"]

Even with mamba this is a slow process — it takes over 10 minutes to compile the glue package, which has minimal dependencies.

Method 2: Just R

Using just R requires a bit more logic. I’ve separated out some R helper scripts, as well as the bash script that does the actual building. I start with rocker which already has R installed. I also need the remotes package to install package dependencies.

FROM rocker/r-ver:4.0.3

RUN apt-get update && apt-get install -y curl

RUN Rscript -e 'install.packages("remotes")'

ENV R_VERSION=4.0.3
RUN mkdir -p /packages/R-${R_VERSION}
RUN mkdir /scripts

ADD helpers.R /scripts/helpers.R
ADD build-R-package.sh /scripts/build-R-package.sh
RUN chmod +x /scripts/build-R-package.sh

ENTRYPOINT ["/scripts/build-R-package.sh"]

The R helper functions I need query CRAN to determine the latest available version of a package. If the desired version is not the latest, then the source needs to be downloaded from the CRAN archives.

cran_version <- function(package) {
  if (is.null(getOption("repos")) || getOption("repos") == "@CRAN@") {
    options(repos = c(CRAN = "https://cloud.r-project.org/"))
  }
  available <- as.data.frame(available.packages())
  filtered <- available[available$Package == package,]
  if (nrow(filtered) != 1) {
    stop(package, " is not available on CRAN")
  }
  filtered$Version
}

cran_source_url <- function(package, version = NULL) {
  if (is.null(version)) {
    version <- cran_version(package)
    latest_version <- TRUE
  } else {
    latest_version <- (version == cran_version(package))
  }
  bundle <- paste0(package, "_", version, ".tar.gz")
  if (latest_version) {
    paste0("https://cran.r-project.org/src/contrib/", bundle)
  } else {
    paste0("https://cran.r-project.org/src/contrib/Archive/", package, "/", bundle)
  }
}

The bash script calls on the helpers as needed. If no version is specified, the latest version is used. Then the source is downloaded from CRAN and the package is built. It’s also installed — building and installing are closely related with R. Finally the resulting binary is moved to the packages directory.

#!/bin/bash

package=$1
version=$2
if [[ -z "$version" ]]; then
    version=$(Rscript -e "source('/scripts/helpers.R');cat(cran_version('$package'))")
fi
url=$(Rscript -e "source('/scripts/helpers.R');cat(cran_source_url('$package', '$version'))")
echo "Downloading $url"
curl -LO $url

Rscript -e "remotes::install_deps('/${package}_${version}.tar.gz')"

mkdir binary && cd binary
R CMD INSTALL --build /${package}_${version}.tar.gz
mv * /packages/R-${R_VERSION}

The image at the top of this page is in the public domain

devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 4.0.3 (2020-10-10)
#>  os       macOS Big Sur 10.16         
#>  system   x86_64, darwin17.0          
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_AU.UTF-8                 
#>  ctype    en_AU.UTF-8                 
#>  tz       Australia/Melbourne         
#>  date     2021-04-19                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version    date       lib source                            
#>  callr         3.6.0      2021-03-28 [1] CRAN (R 4.0.3)                    
#>  cli           2.4.0      2021-04-05 [1] CRAN (R 4.0.2)                    
#>  crayon        1.4.1      2021-02-08 [1] CRAN (R 4.0.2)                    
#>  desc          1.3.0      2021-03-05 [1] CRAN (R 4.0.2)                    
#>  devtools      2.3.2      2020-09-18 [1] CRAN (R 4.0.2)                    
#>  digest        0.6.27     2020-10-24 [1] CRAN (R 4.0.2)                    
#>  downlit       0.2.1      2020-11-04 [1] CRAN (R 4.0.2)                    
#>  ellipsis      0.3.1      2020-05-15 [1] CRAN (R 4.0.2)                    
#>  evaluate      0.14       2019-05-28 [1] CRAN (R 4.0.1)                    
#>  fansi         0.4.2      2021-01-15 [1] CRAN (R 4.0.2)                    
#>  fs            1.5.0      2020-07-31 [1] CRAN (R 4.0.2)                    
#>  glue          1.4.2      2020-08-27 [1] CRAN (R 4.0.2)                    
#>  htmltools     0.5.1.1    2021-01-22 [1] CRAN (R 4.0.2)                    
#>  hugodown      0.0.0.9000 2021-04-19 [1] Github (r-lib/hugodown@97ea0cd)   
#>  knitr         1.32       2021-04-14 [1] CRAN (R 4.0.2)                    
#>  lifecycle     1.0.0      2021-02-15 [1] CRAN (R 4.0.2)                    
#>  magrittr      2.0.1      2020-11-17 [1] CRAN (R 4.0.2)                    
#>  memoise       1.1.0      2017-04-21 [1] CRAN (R 4.0.2)                    
#>  pkgbuild      1.2.0      2020-12-15 [1] CRAN (R 4.0.2)                    
#>  pkgload       1.1.0      2020-05-29 [1] CRAN (R 4.0.2)                    
#>  prettyunits   1.1.1      2020-01-24 [1] CRAN (R 4.0.2)                    
#>  processx      3.5.1      2021-04-04 [1] CRAN (R 4.0.2)                    
#>  ps            1.6.0      2021-02-28 [1] CRAN (R 4.0.2)                    
#>  purrr         0.3.4      2020-04-17 [1] CRAN (R 4.0.2)                    
#>  R6            2.5.0      2020-10-28 [1] CRAN (R 4.0.2)                    
#>  remotes       2.2.0      2020-07-21 [1] CRAN (R 4.0.2)                    
#>  rlang         0.4.10     2020-12-30 [1] CRAN (R 4.0.2)                    
#>  rmarkdown     2.7.10     2021-04-19 [1] Github (rstudio/rmarkdown@eb55b2e)
#>  rprojroot     2.0.2      2020-11-15 [1] CRAN (R 4.0.2)                    
#>  sessioninfo   1.1.1      2018-11-05 [1] CRAN (R 4.0.2)                    
#>  stringi       1.5.3      2020-09-09 [1] CRAN (R 4.0.2)                    
#>  stringr       1.4.0      2019-02-10 [1] CRAN (R 4.0.2)                    
#>  testthat      3.0.1      2020-12-17 [1] CRAN (R 4.0.2)                    
#>  usethis       2.0.1      2021-02-10 [1] CRAN (R 4.0.2)                    
#>  vctrs         0.3.7      2021-03-29 [1] CRAN (R 4.0.2)                    
#>  withr         2.4.2      2021-04-18 [1] CRAN (R 4.0.3)                    
#>  xfun          0.22       2021-03-11 [1] CRAN (R 4.0.2)                    
#>  yaml          2.2.1      2020-02-01 [1] CRAN (R 4.0.2)                    
#> 
#> [1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library