What I've learnt about making an R package
The last few weeks have been all about R package development for me. First I was exploring GitHub actions with the lovely people at the rOpenSci OzUnconf, and then I was off to San Francisco to learn about Building Tidy Tools with the Wickham siblings. I’ve picked up a lot about package development, so I’m documenting some of trickier things that I’ve learnt.
A great resource for package development is Hadley’s book. Check it out.
General hints
- There’s a difference between attaching and loading a package. Attaching a package also loads it, and this is what happens when you run the
library
function.- I’ve heard it claimed before that it’s inefficient to use double colons like
readr::read_csv()
because R “loads the package every time”. That’s not true. R loads the package on the first use of a double colon function, but doesn’t attach it. For every call afterwards, that package has already been loaded.
- I’ve heard it claimed before that it’s inefficient to use double colons like
- Running
devtools::check()
(as opposed toR CMD CHECK
) will automatically rundevtools::document()
before the package check if devtools can see that you’re using Roxygen. This isn’t always obvious for a brand new package, so I found that I had to rundevtools::document()
initially. After that first command, however,devtools::check()
started documenting. - Use the
@keywords internal
Roxygen tag for documented internal functions, that is, functions without an@export
tag. This partially hides the documentation from regular users, while still allowing interested users and developers to access it. Thank you to the #rstats Twitter for helping me out with this one! - If you want to skip the portion of a package check where R connects to CRAN (because you’re behind a proxy or you don’t have an internet connection) run
rcmdcheck::rcmdcheck(repos = FALSE)
. This lets you continue package development on planes. - When you use a function like
mutate(mtcars, kph = 0.425144 * mpg)
, the package check will complain because it’s expecting to seemtcars
andmpg
as global variables. It’s just a note so you can ignore it, but if you’re like me and interpret notes in package checks as personal attacks, then this StackExchange post post has some options.- Personally, I just add
utils::globalVariables(c("mtcars", "mpg"))
to aglobals.R
file in myR
folder. If there’s a chance someone else will be looking at your source code, you should add a (non-Roxygen) comment explaining why so they don’t get confused.
- Personally, I just add
- If you’re creating a custom package for a specific data set, I recommend creating a
download_data
function that creates theinst/extdata
folder and downloads the external data only if it doesn’t already exist. You can addinst/extdata
to your.gitignore
. This means that you can host your source code but not your data (which may be quite large) on git, without having to worry about doing multiple redundant downloads. This will also let you delete your R Markdown cache without deleting your local data.
Managing dependencies
- If your package uses another package, add it to your DESCRIPTION file with
usethis::use_package(package)
. By default, this will add the package to the “Imports” section. This is probably what you want, but you can put it in another section by changing thetype
argument inuse_package
to “Depends” or “Suggests”.- “Imports” in the DESCRIPTION file is not the same as “Imports” in the NAMESPACE file. The DESCRIPTION file doesn’t really “import” anything in the namespace sense — it just tells R that those packages should be installed.
- Once you’ve added a package to your DESCRIPTION file, you can use it in your functions in one of three ways. They are, in order of preference (Thank you to Hadley for confirming this at the workshop):
- Use a double colon like
dplyr::mutate
. This is the preferred option since it doesn’t change the namespace of your package. - Add a Roxygen tag
@ImportFrom
, such as@ImportFrom dplyr mutate
. This adds themutate
function to the namespace available to your package functions without anything else from thedplyr
package. This allows you to usemutate
by itself in your package functions. Because this only adds a single function at a time, you’re unlikely to encounter a namespace collision (two objects with the same name in the namespace). - Add a Roxygen tag
@Import
, such as@Import dplyr
. This adds every function in thedplyr
package to the namespace available to your package functions. This isn’t recommended because it makes it very easy to run into a namespace collision.- Something Hadley said: it’s okay to do this if you’re running a package that’s been explicitly designed to be imported in this manner. For example, tidyverse functions import the entire
rlang
namespace. It’s also more acceptable to do this if you’re only importing the entire namespace of a single package, since that’s unlikely to lead to conflicts.
- Something Hadley said: it’s okay to do this if you’re running a package that’s been explicitly designed to be imported in this manner. For example, tidyverse functions import the entire
- Use a double colon like
- You can use the same
@Import
or@ImportFrom
multiple times, and Roxygen will only add it once to the NAMESPACE. I like to keep my dependencies close to the functions that use them, so if I need a pipe in many functions I’ll put@ImportFrom magrittr %>%
in each file. That’s just my personal preference, though. - Once you’ve added either an
@ImportFrom
or@Import
tag, you need todevtools::document()
for the change to take effect. This will add the relevant lines to your NAMESPACE file.devtools::check()
will usually do this for you.- This often catches me out if I change the NAMESPACE through a Roxygen tag and then re-install the package without running
devtools::check()
. I get frustrated that my changes aren’t taking effect, until I realise my mistake!
- This often catches me out if I change the NAMESPACE through a Roxygen tag and then re-install the package without running
- The
base
package never needs to be explicitly referred to or mentioned in your DESCRIPTION file.- You do need to explicitly refer to functions and objects from the
utils
andstats
packages, eg.stats::var(c(2, 3, 3, 3, 4))
, and put them in your DESCRIPTION file.
- You do need to explicitly refer to functions and objects from the
- Packages used in your vignettes that are not used in your package itself should go in the “Suggests” portion of your namespace, eg.
usethis::use_package("ggplot2", type = "Suggests")
.
Importing S3 methods
There’s something that still confuses me. Let me preface this by saying that I’m not going to pretend to understand S3.
Suppose that one of your dependencies uses an S3 method for a generic. For example, the randomForest
package has an (unexported) predict.randomForest
S3 method that allows you to make predictions with new data using the predict
generic. How do you deal with that dependency without importing the whole randomForest
namespace?
I created a quick package to test this out. You can find it on GitHub.
The option I went with here is to call on the internal function directly: randomForest:::predict.randomForest()
. This works, but you’ll get a note because you’re generally not supposed to use :::
in functions. If you’re submitting to CRAN, this could be an issue. Here are the results of R CMD CHECK
:
── R CMD check results ─────────────────── ImportingRandomForest 0.0.0.9000 ────
Duration: 11.3s
❯ checking dependencies in R code ... NOTE
Unexported object imported by a ':::' call: ‘randomForest:::predict.randomForest’
See the note in ?[`:::`](https://rdrr.io/r/base/ns-dblcolon.html) about the use of this operator.
0 errors ✔ | 0 warnings ✔ | 1 note ✖
Alternatively, the stats::predict
generic seems to work with the Roxygen tag @importFrom randomForest randomForest
. I feel like this is a fluke — the S3 method is imported as a consequence of importing the randomForest
function, but it’s not clear that this is happening. Similarly, @importMethodsFrom randomForest predict.randomForest
seems to work, even though R throws a warning that it couldn’t find the method.
I’d welcome any thoughts on this!
The featured image for this post is from pixabay, and is used under the Simplified Pixabay License.
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.0.0 (2020-04-24)
#> os Ubuntu 20.04 LTS
#> system x86_64, linux-gnu
#> ui X11
#> language en_AU:en
#> collate en_AU.UTF-8
#> ctype en_AU.UTF-8
#> tz Australia/Melbourne
#> date 2020-06-13
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0)
#> backports 1.1.7 2020-05-13 [1] CRAN (R 4.0.0)
#> callr 3.4.3 2020-03-28 [1] CRAN (R 4.0.0)
#> cli 2.0.2 2020-02-28 [1] CRAN (R 4.0.0)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.0)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 4.0.0)
#> devtools 2.3.0 2020-04-10 [1] CRAN (R 4.0.0)
#> digest 0.6.25 2020-02-23 [1] CRAN (R 4.0.0)
#> ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.0)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0)
#> fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.0)
#> fs 1.4.1 2020-04-04 [1] CRAN (R 4.0.0)
#> glue 1.4.1 2020-05-13 [1] CRAN (R 4.0.0)
#> htmltools 0.4.0 2019-10-04 [1] CRAN (R 4.0.0)
#> hugodown 0.0.0.9000 2020-06-12 [1] Github (r-lib/hugodown@6812ada)
#> knitr 1.28 2020-02-06 [1] CRAN (R 4.0.0)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 4.0.0)
#> memoise 1.1.0.9000 2020-05-09 [1] Github (hadley/memoise@4aefd9f)
#> pkgbuild 1.0.7 2020-04-25 [1] CRAN (R 4.0.0)
#> pkgload 1.0.2 2018-10-29 [1] CRAN (R 4.0.0)
#> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.0)
#> processx 3.4.2 2020-02-09 [1] CRAN (R 4.0.0)
#> ps 1.3.3 2020-05-08 [1] CRAN (R 4.0.0)
#> R6 2.4.1 2019-11-12 [1] CRAN (R 4.0.0)
#> Rcpp 1.0.4.6 2020-04-09 [1] CRAN (R 4.0.0)
#> remotes 2.1.1 2020-02-15 [1] CRAN (R 4.0.0)
#> rlang 0.4.6 2020-05-02 [1] CRAN (R 4.0.0)
#> rmarkdown 2.2.3 2020-06-12 [1] Github (rstudio/rmarkdown@4ee96c8)
#> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 4.0.0)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.0)
#> stringi 1.4.6 2020-02-17 [1] CRAN (R 4.0.0)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.0)
#> testthat 2.3.2 2020-03-02 [1] CRAN (R 4.0.0)
#> usethis 1.6.1 2020-04-29 [1] CRAN (R 4.0.0)
#> withr 2.2.0 2020-04-20 [1] CRAN (R 4.0.0)
#> xfun 0.14 2020-05-20 [1] CRAN (R 4.0.0)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0)
#>
#> [1] /home/mdneuzerling/R/x86_64-pc-linux-gnu-library/4.0
#> [2] /usr/local/lib/R/site-library
#> [3] /usr/lib/R/site-library
#> [4] /usr/lib/R/library