The last few weeks have been all about R package development for me. First I was exploring GitHub actions with the lovely people at the rOpenSci OzUnconf, and then I was off to San Francisco to learn about Building Tidy Tools with the Wickham siblings. I’ve picked up a lot about package development, so I’m documenting some of trickier things that I’ve learnt.
A great resource for package development is Hadley’s book.
There’s a concept in R of an analysis as a package, in which everything you need for your data analysis is contained within a custom package. When you install the package and build the vignettes, the data analysis is performed and results saved as a pretty HTML or PDF file, generated with R Markdown. I wanted to extend this concept to a machine learning model as a package.
The idea here is that, using vignettes, we can make equivalent installing a package with training a model.
If you listen to university advertisements for data science masters degrees, you’d believe that data scientists are so in-demand that they can walk into any company, state their salary, and start work straight away.
Not quite.
Interviewing for data science positions is tough, and job-seekers face some bad behaviour from recruiters and hiring managers. Many companies understand that they need to do something with data, but they don’t know what. They’ll say they want machine learning when they really want a few dashboards.
Whenever I take an interest in something I think to myself, “How can I combine this with R?”
This post is the result of applying that attitude to Dungeons and Dragons.
So how would I combine D&D with R? A good start would be to have a nice data set of Dungeons and Dragons monsters, with all of their statistics, abilities and attributes. One of the core D&D rule books is the Monster Manual.
When I found myself using R in a corporate environment, my workflow went like this:
Connect to databases Do stuff to data Email results Yes, there exist options for presenting results that are a bit more modern than the old-fashioned email—R Markdown, Shiny, or even Slack, for example. But email is embedded in corporate culture and will be around for a long time to come.
I want to set down how I think a send_email function should work in R.
That’s it for #useR2018. After 6 keynotes, 132 parallel sessions, many more lightning talks and posters, and an all-important conference dinner, we’ve reached the end of the week.
This was my first proper conference since 2015. I had almost forgotten how it felt to be surrounded by hundreds of people who are just as passionate (if not more) about your tiny area of specialised knowledge than you are.
I took notes for the three tutorials I went to, but I wanted to take a moment to review the week as a whole, including the talks that stood out to me.
These are my notes for the super helpful tutorial given by Elizabeth Stark on the first day of the UseR 2018 conference. This was an introduction to Docker for R users who have no prior experience with Docker (which was me!).
Elizabeth’s slides Elizabeth’s exercises and examples This tutorial took me through setting up an RStudio Server container. I’m on a Linux machine, but I’m particularly interested by the idea that you could run these traditionally Linux-only servers on a Windows machine through Docker.
These are my notes for the tutorial given by Max Kuhn on the afternoon of the first day of the UseR 2018 conference.
Full confession here: I was having trouble deciding between this tutorial and another one, and eventually decided on the other one. But then I accidentally came to the wrong room and I took it as a sign that it was time to learn more about preprocessing.
Also, the recipes package is adorable.
My knowledge of wine covers three facts:
I like red wine. I do not like white wine. I love wine data. I came across a great collection of around 130,000 wine reviews, each a paragraph long, on Kaggle. This is juicy stuff, and I can’t wait to dig into it with some text analysis, or maybe build some sort of markov chain or neural network that generates new wine reviews.
If you work in a corporate environment, there’s a good chance you’re using Microsoft Office. I wanted to set up a way to email tables and plots from R using Outlook. Sending an email is simple enough with the RDCOMClient library, but inserting a plot inline—rather than as an attachment—took a little bit of working out. I’m sharing my code here in case anyone else wants to do something similar. The trick is to save your plot as an image with a temporary file, attach it to the email, and then insert it inline using a cid (Content-ID).