AE 10: Opinion articles in The Chronicle

Suggested answers

Application exercise
Important

These are suggested answers. This document should be used as reference only, it’s not designed to be an exhaustive key.

Part 1 - Data scraping

See chronicle-scrape.R for suggested scraping code.

Part 2 - Data analysis

Let’s start by loading the packages we will need:

  • Load the data you saved into the data folder and name it chronicle.
chronicle <- read_csv("data/chronicle.csv")
  • Who are the most prolific authors of the 500 most recent opinion articles in The Chronicle?
chronicle |>
  count(author, sort = TRUE)
# A tibble: 191 × 2
   author             n
   <chr>          <int>
 1 Luke A. Powery    32
 2 Advikaa Anand     26
 3 Heidi Smith       26
 4 Nik Narain        23
 5 Monday Monday     20
 6 Aaron Siegle      15
 7 Anna Garziera     15
 8 Sonia Green       12
 9 Linda Cao         10
10 Angikar Ghosal     9
# ℹ 181 more rows
  • Draw a line plot of the number of opinion articles published per day in The Chronicle.
chronicle |>
  count(date) |>
  ggplot(aes(x = date, y = n, group = 1)) +
  geom_line()

  • What percent of the most recent 100 opinion articles in The Chronicle mention “climate” in their title?
chronicle |>
  mutate(
    title = str_to_lower(title),
    climate = if_else(str_detect(title, "climate"), "mentioned", "not mentioned")
    ) |>
  count(climate) |>
  mutate(prop = n / sum(n))
# A tibble: 2 × 3
  climate           n  prop
  <chr>         <int> <dbl>
1 mentioned        11 0.022
2 not mentioned   489 0.978
  • What percent of the most recent 100 opinion articles in The Chronicle mention “climate” in their title or abstract?
chronicle |>
  mutate(
    title = str_to_lower(title),
    abstract = str_to_lower(abstract),
    climate = if_else(
      str_detect(title, "climate") | str_detect(abstract, "climate"), 
      "mentioned", 
      "not mentioned"
      )
    ) |>
  count(climate) |>
  mutate(prop = n / sum(n))
# A tibble: 2 × 3
  climate           n  prop
  <chr>         <int> <dbl>
1 mentioned        16 0.032
2 not mentioned   484 0.968
  • Come up with another question and try to answer it using the data.
# add code here