AE 03: Bechdel + data visualization and transformation

Suggested answers

Application exercise
Important

These are suggested answers. This document should be used as a reference only; it’s not designed to be an exhaustive key.

In this mini-analysis, we’ll continue our exploration of the bechdel dataset, which contains information on whether the movies in the data pass the Bechdel test (a measure of the representation of women in fiction).

Getting started

Packages

We’ll use the tidyverse package for this analysis.

Data

The data are stored as a CSV (comma-separated values) file in your repository’s data folder. Let’s read it from there and save it as an object called bechdel.

bechdel <- read_csv("data/bechdel.csv")

Get to know the data

We can use the glimpse() function to get an overview (or “glimpse”) of the data.

glimpse(bechdel)
Rows: 1,615
Columns: 7
$ title       <chr> "21 & Over", "Dredd 3D", "12 Years a Slave", "2 Guns", "42…
$ year        <dbl> 2013, 2012, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013…
$ gross_2013  <dbl> 67878146, 55078343, 211714070, 208105475, 190040426, 18416…
$ budget_2013 <dbl> 13000000, 45658735, 20000000, 61000000, 40000000, 22500000…
$ roi         <dbl> 5.221396, 1.206305, 10.585703, 3.411565, 4.751011, 0.81851…
$ binary      <chr> "FAIL", "PASS", "FAIL", "FAIL", "FAIL", "FAIL", "FAIL", "P…
$ clean_test  <chr> "notalk", "ok", "notalk", "notalk", "men", "men", "notalk"…
  • What does each observation (row) in the data set represent?

Each observation represents a movie.

  • How many observations (rows) are in the data set?

There are 1615 movies in the dataset.

  • How many variables (columns) are in the data set?

There are 7 columns in the dataset.

Bechdel test results

Visualizing data with ggplot2

Create a bar plot of the clean_test variable:

  • ok = passes test
  • dubious
  • men = women only talk about men
  • notalk = women don’t talk to each other
  • nowomen = fewer than two women
ggplot(bechdel, aes(x = clean_test)) +
  geom_bar() +
  labs(
    x = "Bechdel test result",
    y = "Count"
  )

What types of movies are more common, those that pass or do not pass the test?

If we consider “dubious” to be no, movies that don’t pass the test are more common.

Render, commit, and push

  1. Render your Quarto document.

  2. Go to the Git pane and check the box next to each file listed, i.e., stage your changes. Commit your staged changes using a simple and informative message.

  3. Click on push (the green arrow) to push your changes to your application exercise repo on GitHub.

  4. Go to your repo on GitHub and confirm that you can see the updated files. Once your updated files are in your repo on GitHub, you’re good to go!

Return-on-investment

Let’s take a look at return-on-investment (ROI) for movies that do and do not pass the Bechdel test.

Step 1 - Your turn

Create side-by-side box plots of roi by clean_test where the boxes are colored by binary.

ggplot(bechdel, aes(x = roi, y = clean_test, color = binary)) +
  geom_boxplot() +
  labs(
    x = "Return on investment",
    y = "Bechdel test result",
    color = "Pass / Fail"
  )

Step 2 - Demo

What are the movies with very high returns on investment?

bechdel |>
  filter(roi > 400)
# A tibble: 3 × 7
  title                    year gross_2013 budget_2013   roi binary clean_test
  <chr>                   <dbl>      <dbl>       <dbl> <dbl> <chr>  <chr>     
1 Paranormal Activity      2007  339424558      505595  671. FAIL   dubious   
2 The Blair Witch Project  1999  543776715      839077  648. PASS   ok        
3 El Mariachi              1992    6778946       11622  583. FAIL   nowomen   

Step 3 - Demo

Expand on your plot from the previous step to zoom in on movies with roi < ___ to get a better view of how the medians across the categories compare.

ggplot(bechdel, aes(x = roi, y = clean_test, color = binary)) +
  geom_boxplot() +
  labs(
    x = "Return on investment",
    y = "Bechdel test result",
    color = "Pass / Fail"
  ) +
  coord_cartesian(xlim = c(0, 16))
Warning: Removed 15 rows containing non-finite outside the scale range
(`stat_boxplot()`).

What does this plot say about return-on-investment on movies that pass the Bechdel test?

Movies that pass the Bechdel test typically have an higher return-on-investment than those that do not.

Render, commit, and push

  1. Render your Quarto document.

  2. Go to the Git pane and check the box next to each file listed, i.e., stage your changes. Commit your staged changes using a simple and informative message.

  3. Click on push (the green arrow) to push your changes to your application exercise repo on GitHub.

  4. Go to your repo on GitHub and confirm that you can see the updated files. Once your updated files are in your repo on GitHub, you’re good to go!