Locking down R package dependencies and versions is a solved problem, thanks to the easy-to-use
renv package. System dependencies — those Linux packages that need to be installed to make certain R packages work — are a bit harder to manage.
drake is a package for orchestrating R workflows. Suppose I have some data in S3 that I want to pull into R through a
drake plan. In this post I’ll use the S3 object’s ETag to make
drake only re-download the data if it’s changed.
After I posted my efforts to use MLflow to serve a model with R, I was worried that people may think I don’t like MLflow. I want to declare this: MLflow is awesome. I’ll showcase its model tracking features, and how to integrate them into a
There’s always a need for more
tidymodels examples on the Internet. Here’s a simple machine learning model using the recent coffee Tidy Tuesday data set. The plot above gives the approach: I’ll define some preprocessing and a model, optimise some hyperparameters, and fit and evaluate the result. And I’ll piece all of the components together using
targets, an experimental alternative to the
drake package that I love so much.
I’m obsessed with how to structure a data science project. The time I spend worrying about project structure would be better spent on actually writing code. Here’s my preferred R workflow, and a few notes on Python as well.