mdneuzerling

The toughest prediction any data scientist makes is deciding which tools are worth learning. The explosion of generative AI only makes this harder. My prediction: LangChain is here to stay or, at least, the patterns behind it are. LangChain’s job is to drag non-deterministic GenAI outputs into a deterministic world. Putting GenAI into real workloads demands the right mindset from a data scientist. We’re not training the models that generate the answers; we’re taking those answers and forcing them into a tightly defined realm.

I’m less insecure about my career Those of us who write code are an insecure bunch. Technology moves faster than any one human being can keep up. The threat of falling behind is real and can impact careers and earnings, and that’s a terrifying thing. I studied a lot of tech concepts in my free time to try to stay relevant. I have a Kubernetes cluster sitting in my living room that, looking back, exists only because of a misprediction that I would need to know Kubernetes to stay employable.

Plotting commute times with R and Google Maps

I’m house-hunting, and while I’d love to buy a 5-bedroom house with a pool 10 minutes walk from Flinders Street Station I probably can’t afford that. So I need to take a broader look at Melbourne. One of the main constraints is commute time. I built a choropleth of commute times to the University of Melbourne and put it on top of a map of Melbourne. The rough idea is to create a fine hexagonal grid across the city using the sf package, and then to pass the centre of each hexagon through the Google Maps Directions Matrix API with the help of the (Melbourne-made) googleway package.

Exemplar: a prototype R package for data validation

I’ve been playing around with an idea for a new R package. I call it exemplar and here’s how it works: I provide an example of what data should look like — an exemplar. The package gives a function that checks to make sure that any new data looks the same. The generated function checks — for each column — duplicate values, missing values, ranges, and more. The validation function doesn’t have any dependencies at all.

My Machine Learning Process (Mistakes Included)

When I train a machine learning model in a blog post, I edit out all the mistakes. I make it seem like I had the perfect data I needed from the very start, and I never add a useless feature. This time, I want to train a model with all the mistakes and fruitless efforts included. My goal here is to describe my process of creating a model rather than just presenting the final code.

Advent of Code 2021: A Julia Journal - Part 2

Advent of Code is an advent calendar for programming puzzles. I decided to tackle this year’s set of 50 puzzles in Julia and journal my experiences along the way. I’m a beginner in Julia so I thought this would help me improve my skills. This post covers days 9 through 16. Day 9: Bracket matching Syntax error in navigation subsystem on line: all of them I over-engineered the heck out of this puzzle.

Advent of Code 2021: A Julia Journal - Part 1

Advent of Code is an advent calendar for programming puzzles. I decided to tackle this year’s set of 50 puzzles in Julia and journal my experiences along the way. I’m a beginner in Julia so I thought this would help me improve my skills. This post covers days 1 through 8. All of my solutions are available on GitHub. Day 1: Increasing sequences Count the number of times a depth measurement increases from the previous measurement

I love Julia’s UnicodePlots.jl, a package for making pretty, colourful plots directly in the terminal. While playing around for Advent of Code I wrote a function to animate a sequence of Unicode plots. It’s not much, but I couldn’t find anything similar on Google so I thought I’d share. The move_up helper function is the fiddly part; it moves the cursor to the start of where the plot begins so that a new plot can be printed right on top.

Serverless, On-Demand, Parametrised R Markdown Reports with AWS Lambda

I have a URL with a colour parameter, like “https://example.com/diamonds?colour=H”. When I go to this URL in my browser, an AWS Lambda instance takes that parameter and passes it to rmarkdown::render, which knits a customised R Markdown report. My Lambda returns the knitted report as HTML, which my browser displays. If I change the parameter to “colour=G”, I get a different report, knitted on-demand. This is all serverless, so I only pay each time a report is requested (around $0.

I Tried to Improve how Metaflow Converts R to Python (and I Failed)

Metaflow is one of my favourite R packages. Actually, it’s a Python module, but the R package provides a set of bindings for running R code through Metaflow. Recently I’ve spent a good amount of effort trying to improve the way that R data is translated to the Python side of Metaflow, but I just can’t get it to work. So I thought I’d post about what I’ve learnt. Maybe someone will have an answer.

Wrangling LLM output with LangChain

Why I no longer code in my free time

Plotting commute times with R and Google Maps

Exemplar: a prototype R package for data validation

My Machine Learning Process (Mistakes Included)

Advent of Code 2021: A Julia Journal - Part 2

Advent of Code 2021: A Julia Journal - Part 1

Animated Unicode Plots with Julia

Serverless, On-Demand, Parametrised R Markdown Reports with AWS Lambda

I Tried to Improve how Metaflow Converts R to Python (and I Failed)