Lecture 2
Duke University
STA 199 - Fall 2024
September 3, 2024
Office hours are posted on the course website!
If you can follow along with today’s application exercise steps, great! If something doesn’t work as expected, ask me/TA during the exercise. We’ll either:
Last time:
We introduced you to the course toolkit.
You cloned your ae
repositories and started making some updates in your Quarto documents.
You did not commit and push your changes back.
Today:
You will commit your changes from last time and push them to wrap up that application exercise.
We will introduce data visualization.
You will pull to get today’s application exercise file.
You will work on the new application exercise on data visualization, commit your changes, and push them.
ae-01-meet-the-penguins
Go to RStudio, confirm that you’re in the ae
project, and open the document ae-01-meet-the-penguins.qmd
.
Once we made changes to our Quarto document, we
went to the Git pane in RStudio
staged our changes by clicking the checkboxes next to the relevant files
committed our changes with an informative commit message
pushed our changes to our application exercise repos
confirmed on GitHub that we could see our changes pushed from RStudio
Grab one before you leave!
Remember this visualization from the first day of class?
how the sausage is made!
us_uk_tr_votes <- un_votes |>
inner_join(un_roll_calls, by = "rcid") |>
inner_join(un_roll_call_issues, by = "rcid", relationship = "many-to-many") |>
filter(country %in% c("United Kingdom", "United States", "Turkey")) |>
mutate(year = year(date)) |>
group_by(country, year, issue) |>
summarize(percent_yes = mean(vote == "yes"), .groups = "drop")
Note
Let’s leave these details aside for a bit, we’ll revisit this code at a later point in the semester. For now, let’s agree that we need to do some “data wrangling” to get the data into the right format for the plot we want to create. Just note that we called the data frame we’ll visualize us_uk_tr_votes
.
Map year
to the x
aesthetic
Map percent_yes
to the y
aesthetic
Aesthetics are visual properties of a plot
In the grammar of graphics, variables from the data frame are mapped to aesthetics
It’s common practice in R to omit the names of first two arguments of a function:
with a geom
Map country
to the color
aesthetic
with another geom
geom_smooth()
resulted in the following warning:`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
with alpha
with se = FALSE
We built a plot layer-by-layer
ae-02-bechdel-dataviz
ae
project in RStudio.ggplot()
.+
s.