How I'd like to send an email from R
When I found myself using R in a corporate environment, my workflow went like this:
- Connect to databases
- Do stuff to data
- Email results
Yes, there exist options for presenting results that are a bit more modern than the old-fashioned email—R Markdown, Shiny, or even Slack, for example. But email is embedded in corporate culture and will be around for a long time to come.
I want to set down how I think a send_email
function should work in R.
But we can already send emails in R
Just sending an email is nothing new. There’s the sendmailR
and mailR
packages, for example. These use the SMTP protocol. Then there’s the gmailr
package, which connects to gmail’s REST API to send (and receive) mail. I’ve played around with doing the same for Outlook (outlookr
, anyone?), which also covers the Office365 environment found almost ubiquitously in older corporate environments.
My first attempt to improving my workflow used none of that. I used Duncan Temple Lang ’s RDCOMClient
(also available on GitHub) to connect to a locally installed copy of Microsoft Outlook. This package allows R to connect to the DCOM architecture. You can think of DCOM as an API for communicating with Microsoft Office in Windows environments.
I’ll talk about the benefits and drawbacks of DCOM later, but the main appeal for me is that I can connect R to the Outlook application installed on my (Windows) computer, and let Outlook handle all of that super tricky authentication nonsense. This was the appeal for me—no passwords or OAuth keys, because that stuff is all too hard.
For my use case, I wanted to send things like reports and alerts. That meant sending ggplots and data frames, not as attachments but in the body of the email. With a lot of help from StackExchange, I worked out how to do this with RDCOMClient
. I wanted to do more than just send an email in R. I wanted emailing from within R to feel like a natural extension of the language.
The prototype: RDCOMOutlook
I’ve been playing around with RDCOMClient
for a while. It was even responsible for my first StackExchange answer. But all of stuff I had learnt was scattered across a dozen stray R scripts. So I spent a week turning everything I had done with RDCOMClient
into a package called RDCOMOutlook
, available on GitHub.
I want to be clear here: this package is a proof-of-concept, and I have no plans to develop it any futher. I’m not submitting it to CRAN, especially since RDCOMClient
itself is no longer available on CRAN. But developing the package helped me realise what I wanted a send_email
function to look like.
Actually, in RDCOMOutlook
it’s called prepare_email
. You can do this thing with DCOM where you get the email to pop up on the user’s screen without immediately sending. I thought that was cool, and I made it the default behaviour, with a send
argument as an option.
The prototype: prepare_email
Here’s the head of the prepare_email
function in RDCOMOutlook
:
prepare_email <- function(
embeddings = NULL,
body = "",
to = "",
cc = "",
subject = "",
attachments = NULL,
css = "",
send = FALSE,
data_file_format = "csv",
image_file_format = "png"
)
You can see some expected stuff in there. Emails have bodies, subjects, recipients and (optionally) cc’d recipients and attachments. These arguments are natural and expected. These are HTML emails, so you can even use some custom CSS (I used this to put some company colours into my reports). None of the arguments are required; running prepare_email()
causes a blank Outlook composition window to pop up on the user’s screen.
But embeddings
, data_file_format
and image_file_format
are a bit weirder. And embeddings
is the first argument. The first argument in an R function is in a privileged position, because that’s the default target for the pipe (%>%
).
Here’s what happens when you give object obj
to the embeddings
argument:
- If
obj
is a ggplot, it will be embedded into the body of the email as a resonably sized image. - If
obj
is a data frame or tibble, it will be converted into a HTML table and embedded into the body of the email. - If
obj
is a file path pointing to an image file, it will be embedded into the body of the email. - If
obj
is a file path pointing to a file that isn’t an image, it will be passed to theattachments
argument. - Failing all of that, an error is thrown:
obj is not a ggplot, data frame, tibble or valid file path. Check that the file exists.
A benefit of DCOM
is that you can get the user’s email signature as defined in Outlook. So I put the embedding between the provided body
and the signature.
The attachments
argument follows similar logic, except it will attach a plot or data frame/tibble. This is where the file format arguments come into play. I like data_file_format
—you might want to send an Excel file, for example. But I think we can do without the image_file_format
argument. Does anyone really care if their image is a jpeg or a png?
What happens in the background?
To embed or attach a ggplot, we need to save it as a file in a temporary location. We attach the file and—if we’re embedding it—refer to the file name in an HTML tag using a content identifider (cid). This tells the email client that it needs to show the attachment in the body of the email.
When I first tried to do this I got some warped ggplots. You need to specify image dimensions in HTML, but that means getting the image dimensions. The readbitmap
package is crucial here, since it lets me inspect the most commonly used image formats.
At one point, I was inspecting file headers to try to guess the image format!
file_header <- readBin(file_path, "raw", n = 8)
# Reference headers
png_header <- as.raw(strtoi(c("89", "50", "4e", "47", "0d", "0a", "1a", "0a"), 16))
jpg_header <- as.raw(strtoi(c("FF", "D8"), 16))
bmp_header <- as.raw(strtoi(c("42", "4D"), 16))
gif_header <- as.raw(strtoi(c("47", "49", "46"), 16))
format <- if (identical(file_header, png_header)) {
"png"
} else if (identical(file_header[1:2], jpg_header)) {
"jpg"
} else if (identical(file_header[1:2], bmp_header)) {
"bmp"
} else if (identical(file_header[1:3], gif_header)) {
"gif"
} else {
"unknown"
}
The images then have to be scaled down to a reasonable maximum size (I used 800 pixels in either dimension), while preserving the image ratio.
There’s also the matter of turning stuff into a list in R. I can run lists through purrr
functions to embed/attach multiple files. But am I the only one who finds this really hard? Check out this hideous helper function I used:
make_list <- function(x) {
if (is.null(x)) {
x
} else if (is.ggplot(x)) { # ggplots are lists
list(x)
} else if (is.data.frame(x)) {
list(x)
} else if (is.list(x)) {
x
} else if (is.vector(x)) {
as.list(x)
} else {
list(x) # single item case
}
}
Lists are also important here because lists can have names, and we need those for embeddings and attachments. If the user puts only obj
into the embeddings or attachments argument, prepare_email
will attach, for example, obj.png
. With a list of embeddings or attachments, it will use the names in the list. If these aren’t available, or if the object is named .
(as would be the case if it is coming from a pipe), sensible dummy names are used. File names aren’t visible for embeddings, but we do need to ensure that they don’t conflict, or else the cid tags will get confused.
The ideal send_email
function
As I said, I don’t have any plans to develop this package any further. RDCOMOutlook
is great for my situation, but it’s not a modern answer. For one thing, it only works on Windows, and only with Outlook. DCOM itself is old and the documentation is non-existent; there were times here where I was literally guessing function names.
But most of the hard stuff is just juggling list names and image dimensions. That doesn’t use DCOM. So why can’t I take what I’ve done and stick in some other way of sending emails? So maybe the new function, send_email
, will have something like a connection
argument?
Without DCOM I do lose that nifty ability to make an email pop up on the screen instead of sending it. That’s why I have to drop the prepare_email
function name. I might also lose the ability to pick up the user’s signature.
Here’s a possible way to move away from DCOM:
- Focus on getting the prototype to work with SMTP. I imagine this covers the majority of use cases.
- Bring in compatibility with
gmailr
. - Using
gmailr
as a guide, createoutlookr
and bring it into the fold.
I’ve actually had a fair bit of luck accessing the Outlook API using the wonderful httr
package. I can authenticate and download email attachments. But Turning all of that into a proper package with good credential handling would be a challenge.
Bonus goal: searching emails
I built something else for the RDCOMOutlook
prototype: the ability to search for emails and download attachments. The results are displayed in a nice, pretty tibble:
RDCOMOutlook::search_emails("test") %>% select(subject, received, attachments)
#> # A tibble: 3 x 3
#> subject received attachments
#> <chr> <dttm> <chr>
#> 1 This is a test email 2018-06-12 16:42:42 ""
#> 2 Another test 2018-06-12 17:36:08 ""
#> 3 A test email with an attachment 2018-06-12 17:36:36 "shiborgi.jpg"
The problem here is that the AdvancedSearch
method of DCOM is asynchronous; that is, the search will continue to run in the background while R continues with the next statement. There is an AdvancedSearchComplete
event, I wasn’t able to work out how to handle DCOM events. There is a package, called RDCOMEvents
, that sounds suitable for this.
But I was able to download attachments from an Office365 email account using the Outlook REST API. I believe that gmailr
can do the same. So I can probably recreate this without DCOM. This is a stretch goal, and probably a distraction, but it does seem like nice functionality to have.
Sources
The header image at the top of this page is modified from an image in the public domain.
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.0.0 (2020-04-24)
#> os Ubuntu 20.04 LTS
#> system x86_64, linux-gnu
#> ui X11
#> language en_AU:en
#> collate en_AU.UTF-8
#> ctype en_AU.UTF-8
#> tz Australia/Melbourne
#> date 2020-06-13
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0)
#> backports 1.1.7 2020-05-13 [1] CRAN (R 4.0.0)
#> callr 3.4.3 2020-03-28 [1] CRAN (R 4.0.0)
#> cli 2.0.2 2020-02-28 [1] CRAN (R 4.0.0)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.0)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 4.0.0)
#> devtools 2.3.0 2020-04-10 [1] CRAN (R 4.0.0)
#> digest 0.6.25 2020-02-23 [1] CRAN (R 4.0.0)
#> downlit 0.0.0.9000 2020-06-12 [1] Github (r-lib/downlit@87fb1af)
#> ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.0)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0)
#> fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.0)
#> fs 1.4.1 2020-04-04 [1] CRAN (R 4.0.0)
#> glue 1.4.1 2020-05-13 [1] CRAN (R 4.0.0)
#> hms 0.5.3 2020-01-08 [1] CRAN (R 4.0.0)
#> htmltools 0.4.0 2019-10-04 [1] CRAN (R 4.0.0)
#> hugodown 0.0.0.9000 2020-06-12 [1] Github (r-lib/hugodown@6812ada)
#> knitr 1.28 2020-02-06 [1] CRAN (R 4.0.0)
#> lifecycle 0.2.0 2020-03-06 [1] CRAN (R 4.0.0)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 4.0.0)
#> memoise 1.1.0.9000 2020-05-09 [1] Github (hadley/memoise@4aefd9f)
#> pillar 1.4.4 2020-05-05 [1] CRAN (R 4.0.0)
#> pkgbuild 1.0.7 2020-04-25 [1] CRAN (R 4.0.0)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.0)
#> pkgload 1.0.2 2018-10-29 [1] CRAN (R 4.0.0)
#> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.0)
#> processx 3.4.2 2020-02-09 [1] CRAN (R 4.0.0)
#> ps 1.3.3 2020-05-08 [1] CRAN (R 4.0.0)
#> R6 2.4.1 2019-11-12 [1] CRAN (R 4.0.0)
#> Rcpp 1.0.4.6 2020-04-09 [1] CRAN (R 4.0.0)
#> readr 1.3.1 2018-12-21 [1] CRAN (R 4.0.0)
#> remotes 2.1.1 2020-02-15 [1] CRAN (R 4.0.0)
#> rlang 0.4.6 2020-05-02 [1] CRAN (R 4.0.0)
#> rmarkdown 2.2.3 2020-06-12 [1] Github (rstudio/rmarkdown@4ee96c8)
#> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 4.0.0)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.0)
#> stringi 1.4.6 2020-02-17 [1] CRAN (R 4.0.0)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.0)
#> testthat 2.3.2 2020-03-02 [1] CRAN (R 4.0.0)
#> tibble 3.0.1 2020-04-20 [1] CRAN (R 4.0.0)
#> usethis 1.6.1 2020-04-29 [1] CRAN (R 4.0.0)
#> utf8 1.1.4 2018-05-24 [1] CRAN (R 4.0.0)
#> vctrs 0.3.1 2020-06-05 [1] CRAN (R 4.0.0)
#> withr 2.2.0 2020-04-20 [1] CRAN (R 4.0.0)
#> xfun 0.14 2020-05-20 [1] CRAN (R 4.0.0)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0)
#>
#> [1] /home/mdneuzerling/R/x86_64-pc-linux-gnu-library/4.0
#> [2] /usr/local/lib/R/site-library
#> [3] /usr/lib/R/site-library
#> [4] /usr/lib/R/library