## Markdown monsters

Whenever I take an interest in something I think to myself, “How can I combine this with R?”

This post is the result of applying that attitude to Dungeons and Dragons.

So how would I combine D&D with R? A good start would be to have a nice data set of Dungeons and Dragons monsters, with all of their statistics, abilities and attributes. One of the core D&D rule books is the Monster Manual. I could attempt to scrape the Monster Manual but I figured that the lovely people behind D&D wouldn’t be too happy if I uploaded most of the book to GitHub!

Fortunately, Wizards of the Coast have included around 300 monsters in the Systems Reference Document (SRD) for 5th edition D&D. This is made available under the Open Gaming License Version 1.0a.

I started off trying to scrape the SRD directly, but scraping a PDF was looking to be a nightmare. Fortunately, vitusventure had already converted the SRD to markdown documents to host the content on the (rather pretty) https://5thsrd.org/. I figured it would be easier to scrape markdown files, since they are structured but simple text. Here’s an example of a monster’s “stat block” written in markdown:

name: Medusa type: monstrosity cr: 6

# Medusa

Medium monstrosity, lawful evil

Armor Class 15 (natural armor)
Hit Points 127 (17d8 + 51)
Speed 30 ft.

STR DEX CON INT WIS CHA
10 (+0) 15 (+2) 16 (+3) 12 (+1) 13 (+1) 15 (+2)

Skills Deception +5, Insight +4, Perception +4, Stealth +5
Senses darkvision 60 ft., passive Perception 14
Languages Common
Challenge 6 (2,300 XP)

Petrifying Gaze. When a creature that can see the medusa’s eyes starts its turn within 30 feet of the medusa, the medusa can force it to make a DC 14 Constitution saving throw if the medusa isn’t incapacitated and can see the creature. If the saving throw fails by 5 or more, the creature is instantly petrified. Otherwise, a creature that fails the save begins to turn to stone and is restrained. The restrained creature must repeat the saving throw at the end of its next turn, becoming petrified on a failure or ending the effect on a success. The petrification lasts until the creature is freed by the greater restoration spell or other magic.
Unless surprised, a creature can avert its eyes to avoid the saving throw at the start of its turn. If the creature does so, it can’t see the medusa until the start of its next turn, when it can avert its eyes again. If the creature looks at the medusa in the meantime, it must immediately make the save.
If the medusa sees itself reflected on a polished surface within 30 feet of it and in an area of bright light, the medusa is, due to its curse, affected by its own gaze.

### Actions

Multiattack. The medusa makes either three melee attacks–one with its snake hair and two with its shortsword–or two ranged attacks with its longbow.
Snake Hair. Melee Weapon Attack: +5 to hit, reach 5 ft., one creature. Hit: 4 (1d4 + 2) piercing damage plus 14 (4d6) poison damage.
Shortsword. Melee Weapon Attack: +5 to hit, reach 5 ft., one target. Hit: 5 (1d6 + 2) piercing damage.
Longbow. Ranged Weapon Attack: +5 to hit, range 150/600 ft., one target. Hit: 6 (1d8 + 2) piercing damage plus 7 (2d6) poison damage.

I put the scraped monsters in the SRD into the monsters data set and uploaded it to a quickly created monstr package. You can access the data set by installing the package with devtools::install_github("mdneuzerling/monstr").

library(monstr)
skimr::skim(monstr::monsters) %>% skimr::kable()

Skim summary statistics
n obs: 317
n variables: 39

Variable type: character

variable missing complete n min max empty n_unique
ac_note 108 209 317 5 23 0 22
alignment 0 317 317 7 40 0 16
description 274 43 317 96 654 0 43
hp 0 317 317 3 11 0 168
languages 0 317 317 1 91 0 86
name 0 317 317 3 25 0 317
senses 0 317 317 20 170 0 88
size 0 317 317 4 10 0 6
speed 0 317 317 4 52 0 90
type 0 317 317 3 30 0 35

Variable type: list

variable missing complete n n_unique min_length median_length max_length
actions 3 314 317 305 1 2 10

Variable type: numeric

variable missing complete n mean sd p0 p25 p50 p75 p100 hist
ac 0 317 317 14.07 3.27 5 12 13 17 25 ▁▂▇▇▃▃▁▁
acrobatics 0 317 317 1.12 1.64 -5 0 1 2 9 ▁▁▆▇▂▁▁▁
animal_handling 0 317 317 0.66 1.49 -5 0 1 1 7 ▁▁▁▇▂▁▁▁
arcana 0 317 317 -1.07 3.46 -5 -4 -2 1 18 ▇▆▃▁▁▁▁▁
athletics 0 317 317 2.55 3.48 -5 0 3 4 14 ▂▂▇▇▃▂▁▁
cha 0 317 317 9.79 5.76 0 5 8 14 30 ▃▇▅▃▂▂▁▁
con 0 317 317 15.16 4.5 8 12 14 18 30 ▃▇▇▃▂▁▁▁
cr 0 317 317 4.6 5.91 0 0.5 2 6 30 ▇▂▁▁▁▁▁▁
deception 0 317 317 -0.11 3.37 -5 -3 -1 2 11 ▇▇▅▃▃▁▁▁
dex 0 317 317 12.61 3.22 1 10 13 15 28 ▁▁▇▇▆▁▁▁
history 0 317 317 -1.05 3.46 -5 -4 -2 1 13 ▇▃▅▂▁▁▁▁
hp_avg 0 317 317 82.31 99.88 1 18 45 114 676 ▇▂▁▁▁▁▁▁
insight 0 317 317 0.95 2.12 -5 0 1 1 10 ▁▁▇▇▂▁▁▁
int 0 317 317 7.86 5.69 1 2 7 12 25 ▇▃▃▃▂▁▁▁
intimidation 0 317 317 -0.35 2.94 -5 -3 -1 2 10 ▃▇▅▅▃▁▁▁
investigation 0 317 317 -1.26 2.94 -5 -4 -2 1 7 ▇▃▁▅▁▂▁▁
medicine 0 317 317 0.68 1.55 -5 0 1 1 7 ▁▁▁▇▂▁▁▁
nature 0 317 317 -1.26 2.93 -5 -4 -2 1 7 ▇▃▁▅▁▂▁▁
perception 0 317 317 2.87 4.02 -5 0 2 4 17 ▁▇▇▅▁▁▁▁
performance 0 317 317 -0.36 2.94 -5 -3 -1 2 10 ▃▇▅▃▃▁▁▁
persuasion 0 317 317 -0.19 3.34 -5 -3 -1 2 16 ▆▇▃▃▁▁▁▁
religion 0 317 317 -1.18 3.12 -5 -4 -2 1 15 ▇▆▃▂▁▁▁▁
sleight_of_hand 0 317 317 1.11 1.62 -5 0 1 2 9 ▁▁▆▇▂▁▁▁
stealth 0 317 317 2.3 2.56 -5 0 2 4 10 ▁▁▆▇▆▃▁▁
str 0 317 317 15.34 6.63 1 11 16 19 30 ▂▂▂▅▇▂▂▂
survival 0 317 317 0.7 1.54 -5 0 1 1 7 ▁▁▁▇▂▁▁▁
wis 0 317 317 11.72 2.98 0 10 12 13 25 ▁▁▁▇▃▁▁▁
xp 0 317 317 4275.3 12436.13 0 100 450 2300 155000 ▇▁▁▁▁▁▁▁

Because it’s an R crime to introduce a new data set without a ggplot, here we can see the relationship between strength and constritution, faceted by monster size:

monsters %>%
ggplot(aes(x = str, y = con)) +
geom_point() +
facet_wrap(. ~ size, nrow = 2)

One note before we go on: “monster” is a generic term. This data set contains bandits which, while of questionable moral character, are not necessarily monstrous. You can also find a simple frog in this data set, capable of nothing more than a ribbit. We refer to them all as “monsters”, perhaps unfairly!

## Scraping line-by-line

Let’s take the Medusa monster above, loaded as a single string. I’m going to make life easier for myself by separating the string into lines. At first I tried to do this myself with strsplit, but please take my advice: use the stringi package. You’ll notice that I turn the resulting list into a single-column tibble. I won’t lie: I find manipulating lists directly difficult, so being able to use dplyr verbs makes me happy. I’m also going to remove the italics (represented in markdown by underscores) since I won’t need them here.

lines <- monster %>%
stringi::stri_split_lines(omit_empty = TRUE) %>%
unlist %>%
as_tibble %>% # much easier to deal with than lists
mutate_all(trimws) %>%
mutate_all(function(x) gsub("_", "", x)) # remove italics
print(lines, n = nrow(lines))
## # A tibble: 23 x 1
##    value
##    <chr>
##  1 name: Medusa
##  2 type: monstrosity
##  3 cr: 6
##  4 # Medusa
##  5 Medium monstrosity, lawful evil
##  6 **Armor Class** 15 (natural armor)
##  7 **Hit Points** 127 (17d8 + 51)
##  8 **Speed** 30 ft.
##  9 | STR     | DEX     | CON     | INT     | WIS     | CHA     |
## 10 |---------|---------|---------|---------|---------|---------|
## 11 | 10 (+0) | 15 (+2) | 16 (+3) | 12 (+1) | 13 (+1) | 15 (+2) |
## 12 **Skills** Deception +5, Insight +4, Perception +4, Stealth +5
## 13 **Senses** darkvision 60 ft., passive Perception 14
## 14 **Languages** Common
## 15 **Challenge** 6 (2,300 XP)
## 16 **Petrifying Gaze.** When a creature that can see the medusa's eyes sta…
## 17 Unless surprised, a creature can avert its eyes to avoid the saving thr…
## 18 If the medusa sees itself reflected on a polished surface within 30 fee…
## 19 ### Actions
## 20 **Multiattack.** The medusa makes either three melee attacks--one with …
## 21 **Snake Hair.** Melee Weapon Attack: +5 to hit, reach 5 ft., one creatu…
## 22 **Shortsword.** Melee Weapon Attack: +5 to hit, reach 5 ft., one target…
## 23 **Longbow.** Ranged Weapon Attack: +5 to hit, range 150/600 ft., one ta…

## Scraping monster name, type and CR

The wonderful thing about these markdown files is that they have a nifty couple of lines up the top listing the name, type and challenge rating (cr) of the monster. These are marked by headings with colons, so we’ll define a function to extract the data based on that.

extract_from_colon_heading <- function(lines, heading) {
lines %>%
as.character %>%
trimws
}

c(
)
## [1] "Medusa"      "monstrosity" "6"

I should offer some explanations for those new to D&D! The “type” of a monster is a category like beast, undead or—in the case of the Medusa—monstrosity. The challenge rating is a rough measure of difficulty. The Medusa has a challenge rating of 6, so is a suitable encounter for 4 players with characters of level 6. Characters begin at level 1 and move up to level 20 (if the campaign lasts that long).

## Scraping based on bold text

Most of the information we need is labelled by bold text, represented in markdown by double asterisks. We’ll define three functions:

1. identify_bold_text looks for a given bold_text in a string x, and returns a Boolean value.
2. strip_bold_text removes all bolded text from a string x, and trims white space from either end of the result.
3. extract_from_bold_text looks through a list of lines (like the lines defined above) for a particular bold_text. It will return all text in the string except the bold_text. This function uses the two above.
identify_bold_text <- function(x, bold_text) {
grepl(paste0("\\*\\*", bold_text, "\\*\\*"), x, ignore.case = TRUE)
}

strip_bold_text <- function(x) {
gsub("\\*\\*(.*?)\\*\\*", "", x, ignore.case = TRUE) %>% trimws
}

extract_from_bold_text <- function(lines, bold_text) {
lines %>%
filter(identify_bold_text(value, bold_text)) %>%
as.character %>%
strip_bold_text
}

extract_from_bold_text(lines, "Languages")
## [1] "Common"

## Scraping based on brackets

Some of the data we need is found in bracketed information. The extract_bracketed function returns all text inside the first set of brackets found in a string x, or returns NA if no bracketed text is found.

extract_bracketed <- function(x) {
if (!grepl("\$$.*\$$", x)) {
return(NA)
} else {
gsub(".*\$$(.*?)\$$.*", "\\1", x)
}
}

lines %>% extract_from_bold_text("Armor Class") %>% extract_bracketed
## [1] "natural armor"

A monster’s armor class (AC) determines how hard it is to hit the creature with a weapon or certain spells. The Medusa has an AC of 15. To attack the Medusa, a player will roll a 20-sided die (d20) and add certain modifiers based on their character’s skills and proficiencies. If the result is at least 15, the attack hits. The “natural armor” note means that the Medusa’s armor class is provided by thickened skin or scales, and not a separate piece of armour.

## Abilities

Player characters and monsters in D&D have six ability scores that influence almost everything that they do: strength, dexterity, constitution, intelligence, wisdom and charisma. These abilities are represented by numeric scores that usually (but not always) fall between 10 and 20, with 10 being “average” and 20 being superb.

In the markdown files, these ability scores are tables. We look for the table header and find the ability scores two rows below.

ability_header <- min(which(grepl("\\| STR", lines$value), arr.ind = TRUE)) ability_text <- lines$value[ability_header + 2]
ability_vector <- ability_text %>% strsplit("\\|") %>% unlist
names(monster_ability) <- c("STR", "DEX", "CON", "INT", "WIS", "CHA")
monster_ability    
## STR DEX CON INT WIS CHA
##  10  15  16  12  13  15

## Skills

Skills represent the monster’s ability to perform activities. There are 18 skills, and each skill is associated with one of the 6 ability scores.

skill_ability <- tribble(
~skill, ~ability_code_upper,
#-------|------------------
"athletics", "STR",
"acrobatics", "DEX",
"sleight_of_hand", "DEX",
"stealth", "DEX",
"arcana", "INT",
"history", "INT",
"investigation", "INT",
"nature", "INT",
"religion", "INT",
"animal_handling", "WIS",
"insight", "WIS",
"medicine", "WIS",
"perception", "WIS",
"survival", "WIS",
"deception", "CHA",
"intimidation", "CHA",
"performance", "CHA",
"persuasion", "CHA",
)

All skills begin with a roll of a d20 for an element of chance. Modifiers, which can be negative, are then added to the result to determine how well the monster did. The Medusa has a +5 bonus to Deception, which would be added to the roll.

If a skill isn’t listed in the Medusa’s stat block, she can still use it. In this case, she would rely instead on her ability scores. For example, the Medusa isn’t trained in acrobatics, but her high dexterity would give her a slight advantage nevertheless.

Modifiers can be calculated from ability scores with a simple formula, defined below. Note that modifiers can be negative. Zombies, for example, are not known for their high intelligence, and have a history modifier of -4.

modifier <- function(x) {
floor((x - 10) / 2)
}

monster_modifiers <- monster_ability %>%
as.list %>% # preserves list names as column names
as_tibble %>%
mutate_all(modifier) %>% # convert raw ability to modifiers
gather(key = ability_code_upper, value = modifier) # convert to long
monster_modifiers
## # A tibble: 6 x 2
##   ability_code_upper modifier
##   <chr>                 <dbl>
## 1 STR                       0
## 2 DEX                       2
## 3 CON                       3
## 4 INT                       1
## 5 WIS                       1
## 6 CHA                       2

We’re going to list every skill modifier for each monster. We start with the base_skills, determined solely by the monster’s ability scores.

base_skills <- skill_ability %>%
left_join(monster_modifiers, by = "ability_code_upper") %>%
select(skill, modifier)
head(base_skills, 6)
## # A tibble: 6 x 2
##   skill           modifier
##   <chr>              <dbl>
## 1 athletics              0
## 2 acrobatics             2
## 3 sleight_of_hand        2
## 4 stealth                2
## 5 arcana                 1
## 6 history                1

Now we find the listed_skills, which are those explicitly provided in the markdown. We use the extract_from_bold_text function, and split the resulting line along the commas into a vector. The words in an element name the skill, while the number gives the modifier.

This chain of piped functions has a peculiar unlist %>% as.list, which seems to be necessary to preserve the vector names. I’d love to do without this code, since it seems very ugly!

listed_skills <- lines %>%
extract_from_bold_text("Skills") %>%
strsplit(", ") %>%
unlist %>%
lapply(function(x) {
skill_name <- word(x)
names(skill_modifier) <- tolower(skill_name)
skill_modifier
}) %>%
unlist %>% # This is
as.list %>% # so weird
as_tibble %>%
gather(key = skill, value = modifier) %>%
mutate(skill = gsub(" ", "_", skill)) # keep naming conventions (underscores)
listed_skills
## # A tibble: 4 x 2
##   skill      modifier
##   <chr>         <dbl>
## 1 deception         5
## 2 insight           4
## 3 perception        4
## 4 stealth           5

Finally, we combine listed_skills and base_skills, allowing listed skills to override base skills.

monster_skills <- if (length(listed_skills) == 0) {
base_skills
} else {
listed_skills %>% rbind(
anti_join(base_skills, listed_skills, by = "skill")
)
}
monster_skills <- monster_skills[match(base_skills$skill, monster_skills$skill),] # maintain skill order
head(monster_skills, 6)
## # A tibble: 6 x 2
##   skill           modifier
##   <chr>              <dbl>
## 1 athletics              0
## 2 acrobatics             2
## 3 sleight_of_hand        2
## 4 stealth                5
## 5 arcana                 1
## 6 history                1

## Monster actions

Actions are are a tough one. Take a look at the last 5 lines of the markdown:

tail(lines, 5)
## # A tibble: 5 x 1
##   value
##   <chr>
## 1 ### Actions
## 2 **Multiattack.** The medusa makes either three melee attacks--one with i…
## 3 **Snake Hair.** Melee Weapon Attack: +5 to hit, reach 5 ft., one creatur…
## 4 **Shortsword.** Melee Weapon Attack: +5 to hit, reach 5 ft., one target.…
## 5 **Longbow.** Ranged Weapon Attack: +5 to hit, range 150/600 ft., one tar…

We’re going to look for an actions h3 heading (three hashes) “Actions”. The lines that correspond to actions begin after this ### Actions line. The last action is determined by finding either:

1. the line before the next h3 heading or, failing that,
2. the last line.

We then have a list of lines that correspond to monster actions. We’re going to turn these lines into a named vector, in which the name of the action (taken from the bold text) corresponds to the action text.

header_rows <- which(grepl("###", lines$value), arr.ind = TRUE) actions_header_row <- which(lines == "### Actions", arr.ind = TRUE)[,"row"] if (length(actions_header_row) == 0) { # This monster has no actions monster_actions <- NA } else { if (max(header_rows) == actions_header_row) { last_action = nrow(lines) # in this case, the actions are the last lines } else { last_action <- min(header_rows[header_rows > actions_header_row]) - 1 # the row before the heading that comes after ### Actions } action_rows <- seq(actions_header_row + 1, last_action) monster_actions <- lines$value[action_rows]
monster_actions <- monster_actions %>% purrr::map(function(x) {
action_name <- gsub(".*\\*\\*(.*?)\\.\\*\\*.*", "\\1", x)
action <- x %>% strip_bold_text %>% trimws
names(action) <- action_name
action
}) %>% purrr::reduce(c)
}
monster_actions
##                                                                                                                                 Multiattack
## "The medusa makes either three melee attacks--one with its snake hair and two with its shortsword--or two ranged attacks with its longbow."
##                                                                                                                                  Snake Hair
##                  "Melee Weapon Attack: +5 to hit, reach 5 ft., one creature. Hit: 4 (1d4 + 2) piercing damage plus 14 (4d6) poison damage."
##                                                                                                                                  Shortsword
##                                                "Melee Weapon Attack: +5 to hit, reach 5 ft., one target. Hit: 5 (1d6 + 2) piercing damage."
##                                                                                                                                     Longbow
##              "Ranged Weapon Attack: +5 to hit, range 150/600 ft., one target. Hit: 6 (1d8 + 2) piercing damage plus 7 (2d6) poison damage."

## Putting it all together

We can now put everything together into a single-row tibble:

tibble(
cr = lines %>% extract_from_colon_heading("cr") %>% as.numeric,
xp = lines %>% extract_from_bold_text("challenge") %>% extract_bracketed %>% readr::parse_number(),
ac = lines %>% extract_from_bold_text("Armor Class") %>% readr::parse_number(),
ac_note = lines %>% extract_from_bold_text("Armor Class") %>% extract_bracketed,
hp_avg = lines %>% extract_from_bold_text("Hit Points") %>% readr::parse_number(),
hp = lines %>% extract_from_bold_text("Hit Points") %>% extract_bracketed,
str = monster_ability["STR"],
dex = monster_ability["DEX"],
con = monster_ability["CON"],
int = monster_ability["INT"],
wis = monster_ability["WIS"],
cha = monster_ability["CHA"],
senses = lines %>% extract_from_bold_text("Senses"),
languages = lines %>% extract_from_bold_text("Languages"),
speed = lines %>% extract_from_bold_text("Speed"),
actions = monster_actions %>% list
) %>%
as_tibble
## # A tibble: 1 x 36
##   name  type     cr    xp    ac ac_note hp_avg hp      str   dex   con
##   <chr> <chr> <dbl> <dbl> <dbl> <chr>    <dbl> <chr> <dbl> <dbl> <dbl>
## 1 Medu… mons…     6  2300    15 natura…    127 17d8…    10    15    16
## # ... with 25 more variables: int <dbl>, wis <dbl>, cha <dbl>,
## #   senses <chr>, languages <chr>, speed <chr>, actions <list>,
## #   acrobatics <dbl>, animal_handling <dbl>, arcana <dbl>,
## #   athletics <dbl>, deception <dbl>, history <dbl>, insight <dbl>,
## #   intimidation <dbl>, investigation <dbl>, medicine <dbl>, nature <dbl>,
## #   perception <dbl>, performance <dbl>, persuasion <dbl>, religion <dbl>,
## #   sleight_of_hand <dbl>, stealth <dbl>, survival <dbl>

There are a few more fields that I haven’t covered here (size, alignment and description, for example). I’ve put the full version of the parse_monster.R script in a gist.

Of course, this is how to parse just one monster. Fortunately, the purrr package exists. Here’s how to scrape every monster:

1. Clone vitusventure’s 5th edition SRD repository
2. Set the /docs/gamemaster_rules/monsters directory to a variable monster_dir
3. Run the following code:
monsters <- list.files(monster_dir, full.names = TRUE) %>%
purrr::map(parse_monster) %>%
purrr::reduce(rbind)

## What’s next

A few things are missing here:

• Damage/condition immunities and resistances are not being scraped.
• Monster traits, such as the ability to breathe underwater, are not being scraped. I think this is a matter of finding any bold heading that isn’t “standard” and treating it as a trait.
• Some monsters have complicated armor classes. For example, the werewolf has an AC of “11 in humanoid form, 12 (natural armor) in wolf or hybrid form”. This doesn’t fit the template of ac and ac_note.

I’d like to incorporate the spells in the SRD, as well as some basic mechanics. Imagine being able to generate an encounter in R according to a specific party level!

## Sources

The Medusa and all Dungeons and Dragons 5th edition mechanics are available in the Systems Reference Document under the Open Gaming License Version 1.0a. The monsters data set in the monstr package is available under the same license.