Lecture 3
Duke University
STA 199 - Fall 2024
September 5, 2024
AE 01 and AE 02 suggested answers posted on the course website.
Ed Discussion posts:
Monday’s lab:
ae-02-bechdel-dataviz
Go to RStudio, confirm that you’re in the ae
project, and open the document ae-02-bechdel-dataviz.qmd
.
. . .
Cell label
s are helpful for describing what the code is doing, for jumping between code cells in the editor, and for troubleshooting
message: false
hides any messages emitted by the code in your rendered document
ae-03-bechdel-data-viz-transform
Go to your ae project in RStudio.
Make sure all of your changes up to this point are committed and pushed, i.e., there’s nothing left in your Git pane.
If you haven’t yet done so, click Pull to get today’s application exercise file: ae-03-bechdel-data-viz-transform.qmd
.
Work through the application exercise in class, and render, commit, and push your edits by the end of class.
bechdel
data frame
roi
greater than 400 (gross is more than 400 times budget)
title
, roi
, budget_2013
, gross_2013
, year
, and clean_test
# A tibble: 3 × 6
title roi budget_2013 gross_2013 year clean_test
<chr> <dbl> <dbl> <dbl> <dbl> <chr>
1 Paranormal Activity 671. 505595 339424558 2007 dubious
2 The Blair Witch Proje… 648. 839077 543776715 1999 ok
3 El Mariachi 583. 11622 6778946 1992 nowomen
|>
The pipe operator passes what comes before it into the function that comes after it as the first argument in that function.
|>
|>
+
+
Find movies that pass the Bechdel test and display their titles and ROIs in descending order of ROI.
Start with the bechdel
data frame:
Find movies that pass the Bechdel test and display their titles and ROIs in descending order of ROI.
Filter for rows where binary
is equal to "PASS"
:
Find movies that pass the Bechdel test and display their titles and ROIs in descending order of ROI.
Arrange the rows in desc
ending order of roi
:
Find movies that pass the Bechdel test and display their titles and ROIs in descending order of ROI.
Select columns title
and roi
:
Ask another question of the data that can be answered with a data transformation pipeline.