Printing data frames with metadata

2020-05-19

I’m creating an R API wrapper around my state’s public transport service. To make life easier for the users, the responses from the API calls are parsed and returned as tibbles/data frames. To make life easier for me, I need to keep track of the API call behind each tibble. I do this by using the tibble::new_tibble() function to attach metadata to the tibble as attributes, and creating a custom print method to make the metadata visible.

First, the raw response from the API call is structured as a list:

response <- list(
    request = request_url_without_auth,
    retrieved = format(
      Sys.time(),
      format = "%Y-%m-%d %H:%M:%OS %Z",
      tz = "Australia/Melbourne"
    ),
    status_code = status_code,
    content = content
)

Some other function will take the content in this list, process it, and create a parsed tibble. We hand this off to the tibble::new_tibble() function. Along with the new class name — “ptvapi” — we also attach the response metadata as attributes to the new tibble.

tibble::new_tibble(
    parsed,
    nrow = nrow(parsed),
    class = "ptvapi",
    request = response$request,
    retrieved = response$retrieved,
    status_code = response$status_code,
    content = response$content
  )

Let’s say we have a tibble flinders_departures created through this process. flinders_departures will have the following classes, in order: “ptvapi”, “tbl_df”, “tbl”, and “data.frame”.

For those who are unfamiliar to S3, some functions in R — like print(), summary() and predict() — are generics. When we call a generic, R will look through the classes of the argument to find the right method to call. When we call print(flinders_departures) (or, equivalently, enter flinders_departures at the console) R will first look for the print.ptvapi() method. If it can’t find that, it will move on to print.tbl_df(), and so on, until it tries print.default().

If I were to let R go through this method I would print flinders_departures without printing the metadata. So I created a simple way of printing out those attributes:

print.ptvapi <- function(x) {
  if (!is.null(attr(x, "request"))) {
    cat("Request:", attr(x, "request"), "\n")
  }
  if (!is.null(attr(x, "retrieved"))) {
    cat("Retrieved:", attr(x, "retrieved"), "\n")
  }
  if (!is.null(attr(x, "status_code"))) {
    cat("Status code:", attr(x, "status_code"), "\n")
  }
  NextMethod()
}

This method will print out the attributes if they exist. NextMethod() will then make R go down the class chain, until it prints like a regular tibble. This is great for debugging. The response of the API call is (more or less) determined by the three attributes I specifically print. So it makes life much easier for me to be able relate the parsed tibble to the API response.

> flinders_departures
Request: http://timetableapi.ptv.vic.gov.au/v3/departures/route_type/0/stop/1071?max_results=5&date_utc=2020-05-18T12:14:10&include_cancelled=false 
Retrieved: 2020-05-18 22:14:11 AEST 
Status code: 200 
# A tibble: 75 x 11
   direction_id stop_id route_id run_id platform_number at_platform departure_seque…
          <int>   <int>    <int>  <int> <chr>           <lgl>                  <int>
 1            5    1071        6 952531 9               FALSE                      0
 2            2    1071        3 953881 5               FALSE                      0
 3           13    1071       14 954655 4               FALSE                      0
 4            6    1071        7 950675 2               FALSE                      0
 5            9    1071        5 949763 1               FALSE                      0
 6           16    1071       17 954539 9               FALSE                      0
 7           11    1071       12 988175 13              FALSE                      0
 8           10    1071       11 952689 6               FALSE                      0
 9            8    1071        9 951849 3               FALSE                      0
10           14    1071       15 953653 4               FALSE                      0
# … with 65 more rows, and 4 more variables: scheduled_departure <dttm>,
#   estimated_departure <dttm>, flags <chr>, disruption_ids <list>

Most importantly, because I haven’t defined any methods like mutate.ptvapi(), every generic other than print() will treat this tibble as a tibble. So all of my data manipulation functions will ignore the metadata I’ve attached to this tibble.

A quick heads up: it’s not guaranteed that every function will preserve attributes. So after manipulation, the “ptvapi” class may be lost, along with the metadata. That’s fine for my purpose, but maybe not for yours.

devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 4.0.0 (2020-04-24)
#>  os       Ubuntu 20.04 LTS            
#>  system   x86_64, linux-gnu           
#>  ui       X11                         
#>  language en_AU:en                    
#>  collate  en_AU.UTF-8                 
#>  ctype    en_AU.UTF-8                 
#>  tz       Australia/Melbourne         
#>  date     2020-06-14                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version    date       lib source                            
#>  assertthat    0.2.1      2019-03-21 [1] CRAN (R 4.0.0)                    
#>  backports     1.1.7      2020-05-13 [1] CRAN (R 4.0.0)                    
#>  callr         3.4.3      2020-03-28 [1] CRAN (R 4.0.0)                    
#>  cli           2.0.2      2020-02-28 [1] CRAN (R 4.0.0)                    
#>  crayon        1.3.4      2017-09-16 [1] CRAN (R 4.0.0)                    
#>  desc          1.2.0      2018-05-01 [1] CRAN (R 4.0.0)                    
#>  devtools      2.3.0      2020-04-10 [1] CRAN (R 4.0.0)                    
#>  digest        0.6.25     2020-02-23 [1] CRAN (R 4.0.0)                    
#>  downlit       0.0.0.9000 2020-06-12 [1] Github (r-lib/downlit@87fb1af)    
#>  ellipsis      0.3.1      2020-05-15 [1] CRAN (R 4.0.0)                    
#>  evaluate      0.14       2019-05-28 [1] CRAN (R 4.0.0)                    
#>  fansi         0.4.1      2020-01-08 [1] CRAN (R 4.0.0)                    
#>  fs            1.4.1      2020-04-04 [1] CRAN (R 4.0.0)                    
#>  glue          1.4.1      2020-05-13 [1] CRAN (R 4.0.0)                    
#>  htmltools     0.4.0      2019-10-04 [1] CRAN (R 4.0.0)                    
#>  hugodown      0.0.0.9000 2020-06-12 [1] Github (r-lib/hugodown@6812ada)   
#>  knitr         1.28       2020-02-06 [1] CRAN (R 4.0.0)                    
#>  magrittr      1.5        2014-11-22 [1] CRAN (R 4.0.0)                    
#>  memoise       1.1.0.9000 2020-05-09 [1] Github (hadley/memoise@4aefd9f)   
#>  pkgbuild      1.0.7      2020-04-25 [1] CRAN (R 4.0.0)                    
#>  pkgload       1.0.2      2018-10-29 [1] CRAN (R 4.0.0)                    
#>  prettyunits   1.1.1      2020-01-24 [1] CRAN (R 4.0.0)                    
#>  processx      3.4.2      2020-02-09 [1] CRAN (R 4.0.0)                    
#>  ps            1.3.3      2020-05-08 [1] CRAN (R 4.0.0)                    
#>  R6            2.4.1      2019-11-12 [1] CRAN (R 4.0.0)                    
#>  Rcpp          1.0.4.6    2020-04-09 [1] CRAN (R 4.0.0)                    
#>  remotes       2.1.1      2020-02-15 [1] CRAN (R 4.0.0)                    
#>  rlang         0.4.6      2020-05-02 [1] CRAN (R 4.0.0)                    
#>  rmarkdown     2.2.3      2020-06-12 [1] Github (rstudio/rmarkdown@4ee96c8)
#>  rprojroot     1.3-2      2018-01-03 [1] CRAN (R 4.0.0)                    
#>  sessioninfo   1.1.1      2018-11-05 [1] CRAN (R 4.0.0)                    
#>  stringi       1.4.6      2020-02-17 [1] CRAN (R 4.0.0)                    
#>  stringr       1.4.0      2019-02-10 [1] CRAN (R 4.0.0)                    
#>  testthat      2.3.2      2020-03-02 [1] CRAN (R 4.0.0)                    
#>  usethis       1.6.1      2020-04-29 [1] CRAN (R 4.0.0)                    
#>  withr         2.2.0      2020-04-20 [1] CRAN (R 4.0.0)                    
#>  xfun          0.14       2020-05-20 [1] CRAN (R 4.0.0)                    
#>  yaml          2.2.1      2020-02-01 [1] CRAN (R 4.0.0)                    
#> 
#> [1] /home/mdneuzerling/R/x86_64-pc-linux-gnu-library/4.0
#> [2] /usr/local/lib/R/site-library
#> [3] /usr/lib/R/site-library
#> [4] /usr/lib/R/library