Tidy Tuesday: Simpsons Guest Stars

Tidy Tuesday
Vintage

Dive back in and remember the good ol’ days with this ancient post about The Simpsons from when I was barely out of college.

Author

Brad Hill

Published

September 2, 2019

Editor’s Note

What you’re about to witness is a very old post from the first time I tried to get a data science blog together. Some of the code may not work, some of the ideas may be bad/boring, some of the jokes will be stale. It was a simpler time. I’m leaving this here for a few reasons:

  1. I think it’s important to keep old work around, for both information preservation and to show how far you’ve grown
  2. I need inspiration to do this again, and seeing old stuff from me is helpful
  3. The Simpsons is still on TV

Well, we did it. We made it nearly 2 weeks with only one half-assed introduction post on here, and let me tell you, a lot has happened. Taylor Swift dropped Lover, Joe Walsh has decided his next solo album is going to run against Trump in 2020, and Andrew Yang released his climate change plan that includes “taking the country and pushing it somewhere else!” With so many big important things happening, and me being so slow I’m posting it the day before next week’s set, it’s the perfect opportunity to jump into this week’s Tidy Tuesday dataset:

Simpsons Guest Stars

Let’s go ahead and dive in.

simpsons <- readr::read_delim("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-08-27/simpsons-guests.csv", delim = "|", quote = "", show_col_types = FALSE)
glimpse(simpsons)
Rows: 1,386
Columns: 6
$ season          <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1",…
$ number          <chr> "002–102", "003–103", "003–103", "006–106", "006–106",…
$ production_code <chr> "7G02", "7G03", "7G03", "7G06", "7G06", "7G09", "7G07"…
$ episode_title   <chr> "Bart the Genius", "Homer's Odyssey", "Homer's Odyssey…
$ guest_star      <chr> "Marcia Wallace", "Sam McMurray", "Marcia Wallace", "M…
$ role            <chr> "Edna Krabappel;  Ms. Melon", "Worker", "Edna Krabappe…

Well, it looks like this is a pretty low-dimension dataset (6 variables), so we may get more out of just taking a look at the first ten rows than we would trying to mess with glimpse.

head(simpsons, n = 10)
# A tibble: 10 × 6
   season number  production_code episode_title            guest_star      role 
   <chr>  <chr>   <chr>           <chr>                    <chr>           <chr>
 1 1      002–102 7G02            Bart the Genius          Marcia Wallace  Edna…
 2 1      003–103 7G03            Homer's Odyssey          Sam McMurray    Work…
 3 1      003–103 7G03            Homer's Odyssey          Marcia Wallace  Edna…
 4 1      006–106 7G06            Moaning Lisa             Miriam Flynn    Ms. …
 5 1      006–106 7G06            Moaning Lisa             Ron Taylor      Blee…
 6 1      007–107 7G09            The Call of the Simpsons Albert Brooks   Cowb…
 7 1      008–108 7G07            The Telltale Head        Marcia Wallace  Edna…
 8 1      009–109 7G11            Life on the Fast Lane    Albert Brooks   Jacq…
 9 1      010–110 7G10            Homer's Night Out        Sam McMurray    Gull…
10 1      011–111 7G13            The Crepes of Wrath      Christian Coff… Gend…

A couple of things stand out immediately to me.

  • number looks like it has both series episode (before the hyphen) and season-episode pairing (after the hyphen)
  • role can be multiple characters, and is split by a semicolon ;
  • production_code can probably be dropped or ignored

Before I do anything, I want to split up that number column. I’m going to make an arbitrary rule that any two-part episodes (for instance, Season 28’s The Great Phatsby) are going to use the first part’s episode number. I’m going to also ignore the movie for the time being.

I also want to make sure we follow tidy guidelines, so I’m going to split up that role column and have a separate entry for each role, even if it’s the same person playing the roles in the same episode. That is going to help with summarizing things as we go along.

simpsons  %<>% 
  filter(number != 'M1') %>% 
  mutate(number = str_remove(number, ";.*$")) %>% 
  separate(number, c("ep_no", "sep_combo"),sep = "–", remove = F) %>% 
  mutate(sep_no = str_sub(sep_combo, -2, -1),
         role = strsplit(role, ";")) %>%
  unnest(role) %>%
  mutate(role = str_trim(role),
         season = as.numeric(season),
         ep_no = as.numeric(ep_no),
         sep_no = as.numeric(sep_no),
         sep_combo = as.numeric(sep_combo))

Full disclosure, I was a bit unsure about the best way to separate that role column when I didn’t know the number of characters, so I jacked this answer from stackoverflow.

Episodes from First to Last

The Simpsons has been on for a long time. It’s been on longer than I’ve been alive. It’s been on longer than Old Town Road was at the top of the Billboard Hot 100. It’s been on longer than I’ve wanted to fit a “rule of 3” joke into a blog post. Naturally, some of these guest stars have been mainstays in the series, so let’s look at some of the longest standing guest stars and roles.

Clearly we’ve got a couple of power houses here in the guest star longevity category. Marcia Wallace, Jon Lovitz, and Jackie Mason all clock in with more than 620 episodes between their first and last episode. That doesn’t mean they are the most frequent guests, however. The Simpsons is such an institution that the series can get big names for cutaways and gags; a big reason that many of these guest stars have such a long time between their first and last episode isn’t that they are an integral part of the series, they are one off jokes.

Episodes per Guest Star

While Marcia Wallace is clearly a reoccuring guest star, Phil Hartman, who has been in the second most episodes, didn’t even make it in the top 25 for longevity. Jon Lovitz and Jackie Mason, on the other hand, appear in 15 and 9 episodes respectively.

Recurring Guest Characters

Next, let’s take a look at which characters have appeared in the most Simpsons episodes.

No surprise here, Edna Krabappel, a character played by Marcia Wallace, has appeared in the most episodes, clocking in at nearly 175, with the second most recurrent character not even hitting the 30 episode mark. An interesting point that was mentioned in the cleaning process rears its head again in this chart, though. We see here that Phil Hartman, who you’ll recall placed second in guest appearances, lands twice in the top 10 character appearance list.

Busiest Simpsons Guest Stars

This leads me to my next question: which guest star is voicing the most characters?

That’s right, ladies and gentlemen. The most prolific guest voice actor on The Simpsons is none other than The Brain of Pinky & The Brain fame! Maurice LaMarche voices 35 guest characters in the series.

Enough with the Bar Charts Already

Yeah, I know. It was a lot. Lots of counts by names, and not a whole lot more. Let’s see what we can do without bar charts, because I mean, oof.

num_eps <- c()
for(i in 1:30){
  num_eps$season[i] <- i
  num_eps$eps[i] <- read_html(sprintf("https://en.wikipedia.org/wiki/The_Simpsons_(season_%s)",i)) %>% 
    html_nodes(xpath = "/html/body/div[3]/div[3]/div[5]/div[1]/table[1]/tbody/tr[5]/td") %>% 
    html_text() %>% 
    as.numeric()
}
num_eps <- bind_cols(season = num_eps$season, eps = num_eps$eps)

I noticed a lot of people on Twitter going the “number of guests per season” route, and I like that, but I wanted to also throw in an average guests per episode metric as well, because we have some seasons with fewer episodes than others (the first being the most obvious with only 13 episodes.) As the chart above indicates, the trend is still the same, and they track together, but it does show that the change is not nearly as drastic as it looks in the first few seasons. Also worth noting, I had to go grab the number of episodes per season from Wikipedia, because the final episode number would not appear in our data if there was no guest appearance. That did give me another idea, though. Why not get even more season and episode information from Wikipedia so we can take a quick look at rankings before I wrap this up?

Even More Season and Episode Information from Wikipedia

season_info <- list()
for(i in 1:30){
  season_info[[i]] <-
    read_html(sprintf("https://en.wikipedia.org/wiki/The_Simpsons_(season_%s)",i)) %>% 
    html_nodes(xpath = "/html/body/div[3]/div[3]/div[5]/div[1]/table[2]") %>% 
    html_table() %>% 
    `[[`(1) %>%
    as_tibble() %>% 
    mutate(desc = lead(No.overall, 1)) %>% 
    filter(!is.na(as.numeric(No.overall))) %>% 
    janitor::clean_names() %>% 
    mutate(title = str_remove_all(title, '"'),
           desc = str_remove_all(desc, '"'),
           original_air_date = ymd(str_extract(original_air_date, "[0-9]{4}-[0-9]{2}-[0-9]{2}")),
           season = i)
}
season_info <- bind_rows(season_info) %>% 
  mutate(u_s_viewers_millions = as.numeric(str_remove(u_s_viewers_millions, "\\[.*\\]")),
         no_overall = as.numeric(no_overall),
         no_inseason = as.numeric(no_inseason))

Well, that’s sad to see. Granted, the show has been on for 30 years. It’s a wonder it’s still around, but at this point it could probably have 3 viewers a week and still make it on air. Plus, I’d imagine that The Simpsons is a victim of its own success. If The Simpsons wasn’t as successful as it was, Family Guy, American Dad, Bob’s Burgers, etc. wouldn’t even exist, much less be relatively successful. A surprising trend here is mid-season spikes in ratings, especially from season 21 to present day.

My first thought was that it was perhaps just a function of a mid-season hiatus, but that doesn’t really appear to be all that accurate.

Ratings do seem to improve with a 3 week hiatus before the episode airs, but if it was as simple as long breaks upping viewership, there wouldn’t be a drop off for a 4 week/month long hiatus. Maybe 3 weeks is that sweet spot between “Oh, right The Simpsons hasn’t been on in a while!” and “Have they finally canceled The Simpsons?”

Okay, so what is it then?

season ep_no title air_date viewers_mil
5 5 Treehouse of Horror IV 1993-10-28 24.00
6 6 Treehouse of Horror V 1994-10-30 22.20
17 4 Treehouse of Horror XVI 2005-11-06 11.63
19 5 Treehouse of Horror XVIII 2007-11-04 11.70
20 4 Treehouse of Horror XIX 2008-11-02 12.48

Chasing that down the rabbit hole, I found that 6 season highs were Treehouse of Horror episodes. Who knew that Simpsons fans were so enamored with Halloween? Or maybe they’re just enamored with holiday theme episodes. The problem with that idea is that there’s only one season high that aired in December.

season ep_no title air_date viewers_mil
11 12 The Mansion Family 2000-01-23 11.30
12 9 HOMR 2001-01-07 18.50
15 9 I, (Annoyed Grunt)-bot 2004-01-11 16.30
18 10 The Wife Aquatic 2007-01-07 13.90
21 10 Once Upon a Time in Springfield 2010-01-10 14.62
22 10 Moms I’d Like to Forget 2011-01-09 12.60
23 11 The D’oh-cial Network 2012-01-15 11.48
24 9 Homer Goes to Prep School 2013-01-06 8.97
25 9 Steal This Episode 2014-01-05 12.04
26 10 The Man Who Came to Be Dinner 2015-01-04 10.62
27 11 Teenage Mutant Milk-Caused Hurdles 2016-01-10 8.33
28 11 Pork and Burns 2017-01-08 8.19
29 11 Frink Gets Testy 2018-01-14 8.04
30 12 The Girl on the Bus 2019-01-13 8.20

January, however, is a surprisingly good month for The Simpsons, with the first or second episode of the new year being the season high 13 times, including the most recent 9 seasons. Who knows? Maybe Americans everywhere are getting together at Christmas and reminiscing about the days when The Simpsons wasn’t on its last leg. Everyone gets blurry-eyed, goes home, and watches the first episode of the new year before kicking off the cycle all over again next year.

If there’s one thing I know though, The Simpsons was around before I was here, and it’ll be around long after I’m gone.

Diamonds aren’t forever, folks.

Homer Simpson is.