David Neuzerling

David Neuzerling

Data, Maths, R

I’m obsessed with how to structure a data science project. The time I spend worrying about project structure would be better spent on actually writing code. Here’s my preferred R workflow, and a few notes on Python as well.

Suppose I want a function that runs some setup code before it runs the first time. Maybe I’m using dplyr but I haven’t properly declared all of my dplyr calls in my function, so I want to run library(dplyr) before the actual function is run. Or maybe I want to install a package if it isn’t already installed, or restore a renv file, or any other setup process. I only want this special code to run the first time my function is called. After that, the function that runs should be…

MLflow is a platform for the “machine learning cycle”. It’s a suite of tools for managing models, with tracking of hyperparameters and metrics, a registry of models, and options for serving. It’s this last bit that I’m going to focus on today.

Recent posts


Powered by Hugo and hugodown.

This content of this blog is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License except where stated otherwise.