Blog posts

Machine Learning Pipelines with Tidymodels and Targets

Machine Learning Pipelines with Tidymodels and Targets

R
There’s always a need for more tidymodels examples on the Internet. Here’s a simple machine learning model using the recent coffee Tidy Tuesday data set. The plot above gives the approach: I’ll define some preprocessing and a model, optimise some hyperparameters, and fit and evaluate the result. And I’ll piece all of the components together using targets, an experimental alternative to the drake package that I love so much. As usual, I don’t care too much about the model itself.
Data Science Workflows

Data Science Workflows

R
I’m obsessed with how to structure a data science project. The time I spend worrying about project structure would be better spent on actually writing code. Here’s my preferred R workflow, and a few notes on Python as well. The R package workflow In R, the package is “the fundamental unit of shareable code”. At rstudio::conf 2020, Hadley gave a rule of thumb for when to create a package, which I’ll paraphrase: “When you copy and paste a block of code three times, make a function.
Bootstrapping R functions

Bootstrapping R functions

R
Suppose I want a function that runs some setup code before it runs the first time. Maybe I’m using dplyr but I haven’t properly declared all of my dplyr calls in my function, so I want to run library(dplyr) before the actual function is run. Or maybe I want to install a package if it isn’t already installed, or restore a renv file, or any other setup process. I only want this special code to run the first time my function is called.
Deploying R Models with MLflow and Docker

Deploying R Models with MLflow and Docker

R
MLflow is a platform for the “machine learning cycle”. It’s a suite of tools for managing models, with tracking of hyperparameters and metrics, a registry of models, and options for serving. It’s this last bit that I’m going to focus on today. I haven’t been able to find much discussion or documentation about MLflow’s support for R. There’s the RStudio MLflow example, but I wanted to see if I could use MLflow to serve something more complex.
MLOps with GitHub Actions and R

MLOps with GitHub Actions and R

R
As of 2023 the material in this post no longer functions due to changes in GitHub Actions. Machine learning models get stuck at the deployment stage all the time. This stuff is hard. GitHub Actions is a tool for automating tasks associated with a repository. I wanted to see if I could implement some sort of end-to-end automatic training, deployment and execution of a model. And I’m going to use R because people keep telling me that this sort of stuff can’t be done with R.
Upgrade your workflow with drake

Upgrade your workflow with drake

R
Drake is my new favourite R package. Drake is a tool for orchestrating complicated workflows. You piece together a plan based on some high-level, abstract functions. These functions should be pure — they need to be defined by their inputs only, not relying on any predefined variables that aren’t in the function signature. Then, drake will take the steps in that plan and work out how to run it. Here’s how I’ve defined the plan above:
Printing data frames with metadata

Printing data frames with metadata

R
I’m creating an R API wrapper around my state’s public transport service. To make life easier for the users, the responses from the API calls are parsed and returned as tibbles/data frames. To make life easier for me, I need to keep track of the API call behind each tibble. I do this by using the tibble::new_tibble() function to attach metadata to the tibble as attributes, and creating a custom print method to make the metadata visible.
What I've learnt about making an R package

What I've learnt about making an R package

R
The last few weeks have been all about R package development for me. First I was exploring GitHub actions with the lovely people at the rOpenSci OzUnconf, and then I was off to San Francisco to learn about Building Tidy Tools with the Wickham siblings. I’ve picked up a lot about package development, so I’m documenting some of trickier things that I’ve learnt. A great resource for package development is Hadley’s book.
Model as a package

Model as a package

R
There’s a concept in R of an analysis as a package, in which everything you need for your data analysis is contained within a custom package. When you install the package and build the vignettes, the data analysis is performed and results saved as a pretty HTML or PDF file, generated with R Markdown. I wanted to extend this concept to a machine learning model as a package. The idea here is that, using vignettes, we can make equivalent installing a package with training a model.
An update on copyright and licencing

An update on copyright and licencing

When I started this blog I wanted a way to share the quick little projects that distract me. I gave some thought to licencing, but I wanted to make sure that people could use my code if it had any value to them. This is just a little blog by a very unimportant guy—if someone got some use out of my code, I would be flattered! However, in the last few days I’ve seen some unwelcome behaviour.